This is an automated archive made by the Lemmit Bot.
The original was posted on /r/machinelearning by /u/dxtros on 2024-03-28 19:55:31.
Abstract: We demonstrate a technique which allows to dynamically adapt the number of documents in a top-k retriever RAG prompt using feedback from the LLM. This allows a 4x cost reduction of RAG LLM question answering while maintaining the same level of accuracy. We also show that the method helps explain the lineage of LLM outputs. The reference implementation works with most models (GPT4, many local models, older GPT-3.5 turbo) and can be adapted to work with most vector databases exposing a top-k retrieval primitive.
Blog paper:
Reference implementation:
You must log in or register to comment.