Adaptive RAG | Jugal Gajjar

AdaptiveRAG.app - Project Details

Overview

Adaptive RAG is a reinforcement learning-based framework that enables retrieval-augmented generation (RAG) systems to dynamically decide when to retrieve and how much context to use. Unlike static RAG pipelines, which retrieve a fixed number of documents for every query, this approach models retrieval as a decision process, improving both efficiency and accuracy across model scales.

Pipeline

1. Encode query and current reasoning state into a latent representation
2. Policy decides: retrieve or continue generation
3. If retrieval triggered, dynamically select top-k documents from retriever
4. Fuse retrieved context with current prompt
5. Generate intermediate or final response using LLM
6. Update policy using reward signals based on performance and cost

Methodology

The problem is formulated as a Markov Decision Process (MDP), where each state represents the current query, retrieved context, and partial generation. A Deep Q-Network (DQN) is trained to learn optimal retrieval actions, balancing the trade-off between additional context and computational cost. This enables adaptive behavior across diverse queries and model capacities.

Optimization Objective

• Accuracy Reward: correctness of final answer
• Retrieval Cost Penalty: penalizes excessive document retrieval
• Efficiency Term: encourages minimal yet sufficient context usage
• Combined objective promotes cost-aware reasoning

Results

• Accuracy improvement: +3.2% to +6.5%
• Retrieval cost reduction: up to 37%
• Consistent gains across models from 3.8B to 120B parameters
• Improved robustness on long-context and multi-hop queries

Tech. Stack

Python, PyTorch, RL (DQN), FAISS, Retrieval Systems, LLMs, RAG Pipelines

Links

Publication Page GitHub Repository

← Back to Projects

AdaptiveRAG.app | Super Mario Edition | Built by Jugal Gajjar Ready