Why Large Language Models Won't Fix Google

2023-04-02

It is a common meme that Google Search results quality has declined through the years. Despite being sold by many as such, Large Language Models (LLMs) such as ChatGPT, Bing Chat and Bard are not on a course to improve search engines. This is not just a technical issue, but a fundamental misunderstanding of the environment in which search engines operate.

Let's tackle the obvious technical limitation right away: LLMs are just statistical token predictors. Even though they do exhibit emergent properties, it is crucial to understand that their objective function is not to achieve a reasonable truth, but rather to predict the most probable token in a sequence based on their embedded space. In other words, if the dataset on which the model is trained is not more accurate than the index of named search engine, no valuable gain can be expected.


Google, being the most used search engine ever, is currently the most exposed it ever has been to SEO manipulation. It is slowly losing the war on organic content despite billions spent by the company on counter measures.

Today, ChatGPT's exceptional ability to produce organic responses to simple requests, demonstrates that there might be a door to a world where algorithmic search is a problem under control. Despite tendencies to hallucinate obviously or non-obviously responses, those seem more related to the fact that data is missing from the training set than because of malice.

This fundamentally is incorrect. ChatGPT outputs seemingly looking more organic is only due to the fact that SEO adversarial actors have not yet caught up on how to exploit the training, fine tuning and prompt engineering powering the tech. Especially not at industrial levels at which Google is fighting against.

Alas, our short internet memory already forgot why page-rank was considered as revolutionary when Google was first introduced to the world: it was much more difficult to deceive than competitive search engines.

In a world where a ChatGPT becomes the most common gateway to access the web, adversarial actors will quickly catch up on its limits, and we'll be back to the good ol' mediocre results.


The more important reason why search engine accuracy has declined is due the pressure to generate profits. Google, as a publicly traded company, derives most of its revenue from advertising. This means that there is a financial incentive to prioritize ads over organic search results. This has resulted over the years in more ads appearing in search results, which can push down organic results.

LLMs, however, do not face the same pressure to generate profits. At least not yet. They are still seen as a technology that is being developed and improved, and as such, they are not under the same financial pressure as traditional search engines. However, this is likely to change once LLMs become more widely adopted. At that point, they will face the same pressure to generate profits as traditional search engines. This means that LLMs will likely start to prioritize ads over organic response, just like traditional search engines.

Appendix: This is only the beginning: https://twitter.com/debarghya_das/status/1640892791923572737