This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/NoIdeaAbaout on 2024-09-06 10:29:12+00:00.


I think this issue has been debated for a long time. But two interesting articles have recently come out on the issue that I would like to take as a starting point for the discussion on RAG vs. Long-context LLM.

In summary, if we can put everything in the prompt, we don’t need to do retrieval. However I really doubt that we can have a model capable of having a context length that can cover the huge amount of data that any organization has (and without horrendous computational costs).

In any case, there have been unconvincing reports that LC-LLM works better in QA (so far at least I have not read an article that convinced me that LC-LLM works better than RAG).

Two articles came out discussing the impact of noise in LLM and RAG:

  • The first states that noise bumps the performance of an LLM and goes to great lengths to characterize this.
  • The second one compares RAG and LC-LLMs and shows that by increasing the size of the context, we have a spike (we add relevant chunks) and then performance decreases because LLM has a harder time finding the correct information.

I think more or less the reason why we will eventually keep RAG, is that LLMs are sophisticated neural networks and therefore pattern recognition machines. In the end, optimizing signal-to-noise is one of the most common (and sometimes difficult) tasks in machine learning. When we start to increase this noise too much eventually the model is bound to start finding noise and get distracted from important information (plus there is also a subtle interplay between the LLM’s parametric memory and context, and we still don’t know why sometimes ignores the context)

Two, in my personal opinion, there is also a structural reason. self-attention seeks relevant relationships, and under conditions of increased context length, we tend toward a curse of dimensionality in which eventually spurious relationships are accentuated.

I would like to discuss your opinion for what reasons RAG will not be supplanted or if you think LC-LLM will eventually replace it? In the second case, how can it solve the problem of a huge amount of contextually irrelevant data?