
So if that’s not the case, then, for example, we might have a query, like “write me a poem,” that doesn’t refer to anything particularly concrete, fresh, or current. So you’d probably just get an answer directly from the LLM without it doing any of these external ground searches, and then you’d get an answer.
So if you wanted to influence this response, the only thing you could really do would be to influence the underlying training data of the model.
So to give you a little bit of context, GPT-4.0 finished training in, I believe, late 2022, and GPT-4.5 finished training ultimately in, I believe, maybe August 2024. So that’s a pretty long time gap between model updates, and a lot of people are still running models that at this point are coming in at four years or even longer.
So your framework to quickly influence this is non-existent. And in the long term, you’re basically trying to influence any content that goes into the training data for these models, which at this point is basically any written content that ends up being ingested by a computer.
It’s incredibly broad. And this obviously doesn’t just mean your site. This could mean external sites. But at this point it can also mean things like books. It became incredibly, incredibly large with this underlying training data.
