Lemmit.Online bot

Lemmit.Online bot

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/dxtros on 2024-03-28 19:55:31.

Abstract: We demonstrate a technique which allows to dynamically adapt the number of documents in a top-k retriever RAG prompt using feedback from the LLM. This allows a 4x cost reduction of RAG LLM question answering while maintaining the same level of accuracy. We also show that the method helps explain the lineage of LLM outputs. The reference implementation works with most models (GPT4, many local models, older GPT-3.5 turbo) and can be adapted to work with most vector databases exposing a top-k retrieval primitive.

Blog paper:

Reference implementation:

Adaptive RAG: A retrieval technique to reduce LLM token cost for top-k Vector Index retrieval [R]

Adaptive RAG: A retrieval technique to reduce LLM token cost for top-k Vector Index retrieval [R]

This is an automated archive made by the Lemmit Bot.