Adaptive RAG is a reinforcement learning-based framework that enables retrieval-augmented generation (RAG) systems to dynamically decide when to retrieve and how much context to use. Unlike static RAG pipelines, which retrieve a fixed number of documents for every query, this approach models retrieval as a decision process, improving both efficiency and accuracy across model scales.
1. Encode query and current reasoning state into a latent representation
2. Policy decides: retrieve or continue generation
3. If retrieval triggered, dynamically select top-k documents from retriever
4. Fuse retrieved context with current prompt
5. Generate intermediate or final response using LLM
6. Update policy using reward signals based on performance and cost
The problem is formulated as a Markov Decision Process (MDP), where each state represents the current query, retrieved context, and partial generation. A Deep Q-Network (DQN) is trained to learn optimal retrieval actions, balancing the trade-off between additional context and computational cost. This enables adaptive behavior across diverse queries and model capacities.
• Accuracy Reward: correctness of final answer
• Retrieval Cost Penalty: penalizes excessive document retrieval
• Efficiency Term: encourages minimal yet sufficient context usage
• Combined objective promotes cost-aware reasoning
• Accuracy improvement: +3.2% to +6.5%
• Retrieval cost reduction: up to 37%
• Consistent gains across models from 3.8B to 120B parameters
• Improved robustness on long-context and multi-hop queries
Python, PyTorch, RL (DQN), FAISS, Retrieval Systems, LLMs, RAG Pipelines