10 Million Tokens: How Expanded Context Windows Are Making RAG Obsolete
Aadarsh- •
- 09 MIN TO READ

Agentic RAG: Combining Decision-Making with Knowledge Retrieval
The landscape of AI applications has been dramatically transformed by Retrieval-Augmented Generation (RAG) systems, which combine the knowledge retrieval capabilities of vector databases with the generative power of large language models. However, a new paradigm is emerging that takes RAG to the next level: Agentic RAG. This revolutionary approach combines traditional RAG with autonomous decision-making capabilities, creating systems that not only retrieve and generate information but can also reason about it, make decisions, and take actions based on retrieved knowledge.
Traditional RAG systems follow a relatively straightforward workflow:
While this approach has proven extremely effective for knowledge-intensive tasks, it has a fundamental limitation: traditional RAG systems are passive. They retrieve information but lack the ability to decide what to do with that information beyond generating text.
Agentic RAG systems address this limitation by integrating decision-making capabilities:
This shift from passive information retrieval to active decision-making represents a significant advancement in AI capabilities.
Building an effective Agentic RAG system requires several interdependent components:
The foundation of any Agentic RAG system remains effective knowledge retrieval. However, agentic systems often employ more sophisticated retrieval strategies:
1class KnowledgeRetriever:
2 def __init__(self, vector_store, reranker=None):
3 self.vector_store = vector_store
4 self.reranker = reranker # Optional component for improved relevance
5
6 def retrieve(self, query, task_context, n=5):
7 """Retrieve documents relevant to both query and current task context"""
8 # Generate embedding for the combined query and context
9 embedding = self._generate_embedding(f"{query} {task_context}")
10
11 # Retrieve candidate documents
12 candidates = self.vector_store.similarity_search(embedding, k=n*2)
13
14 if self.reranker:
15 # Re-rank documents based on relevance to task
16 scored_candidates = self.reranker.rerank(
17 candidates,
18 query=query,
19 task_context=task_context
20 )
21 return scored_candidates[:n]
22
23 return candidates[:n]
24
25 def _generate_embedding(self, text):
26 # Implementation of embedding generation
27 pass
28
Key advancements in the retrieval layer include:
The decision engine is what transforms a standard RAG system into an agentic one. It evaluates retrieved information and determines appropriate actions:
1class DecisionEngine:
2 def __init__(self, llm, tools):
3 self.llm = llm
4 self.tools = tools # Available actions the agent can take
5
6 def evaluate_information(self, retrieved_docs, user_query, task_state):
7 """Evaluate retrieved information and decide next steps"""
8 # Construct prompt for evaluation
9 prompt = self._construct_evaluation_prompt(
10 retrieved_docs,
11 user_query,
12 task_state
13 )
14
15 # Get LLM response
16 evaluation = self.llm.generate(prompt)
17
18 # Parse evaluation to determine if we have sufficient information
19 if self._has_sufficient_info(evaluation):
20 return self._formulate_response(retrieved_docs, user_query)
21 else:
22 # Determine additional information needed
23 needed_info = self._extract_needed_info(evaluation)
24 return self._plan_information_gathering(needed_info)
25
26 def select_action(self, task_state, retrieved_info):
27 """Select next action based on current state and retrieved information"""
28 available_actions = self._get_available_actions(task_state)
29
30 # Construct prompt for action selection
31 prompt = self._construct_action_prompt(
32 available_actions,
33 retrieved_info,
34 task_state
35 )
36
37 # Get action selection from LLM
38 action_selection = self.llm.generate(prompt)
39
40 # Parse and execute selected action
41 action, params = self._parse_action(action_selection)
42 return self._execute_action(action, params)
43
The decision engine typically employs:
Agentic RAG systems require a well-defined set of tools that allow the agent to interact with external systems or perform specific types of reasoning:
1class ToolRegistry:
2 def __init__(self):
3 self.tools = {}
4
5 def register_tool(self, name, function, description, parameters):
6 """Register a new tool with the agent"""
7 self.tools[name] = {
8 "function": function,
9 "description": description,
10 "parameters": parameters
11 }
12
13 def execute_tool(self, tool_name, parameters):
14 """Execute a registered tool with provided parameters"""
15 if tool_name not in self.tools:
16 raise ValueError(f"Unknown tool: {tool_name}")
17
18 tool = self.tools[tool_name]
19 # Validate parameters against required schema
20 validated_params = self._validate_parameters(
21 parameters,
22 tool["parameters"]
23 )
24
25 # Execute tool function
26 return tool["function"](**validated_params)
27
28 def get_tool_descriptions(self):
29 """Get descriptions of all available tools"""
30 return {name: tool["description"] for name, tool in self.tools.items()}
31
Common tools in Agentic RAG systems include:
Unlike traditional RAG, Agentic RAG requires sophisticated state management to track the progress of multi-step tasks:
1class AgentMemory:
2 def __init__(self):
3 self.episodic_memory = [] # Record of actions and observations
4 self.working_memory = {} # Current task state
5 self.semantic_memory = {} # Long-term knowledge learned across tasks
6
7 def add_episode(self, action, observation, result):
8 """Record an action episode"""
9 episode = {
10 "timestamp": time.time(),
11 "action": action,
12 "observation": observation,
13 "result": result
14 }
15 self.episodic_memory.append(episode)
16
17 def update_working_memory(self, key, value):
18 """Update current task state"""
19 self.working_memory[key] = value
20
21 def get_relevant_history(self, current_context, max_items=5):
22 """Retrieve relevant historical episodes"""
23 # Compute relevance of past episodes to current context
24 scored_episodes = [
25 (self._compute_relevance(episode, current_context), episode)
26 for episode in self.episodic_memory
27 ]
28
29 # Sort by relevance and return top items
30 scored_episodes.sort(reverse=True, key=lambda x: x[0])
31 return [episode for _, episode in scored_episodes[:max_items]]
32
33 def _compute_relevance(self, episode, context):
34 # Implementation of relevance scoring
35 pass
36
Effective memory systems enable:
Several architectural patterns have emerged for implementing Agentic RAG systems:
1graph TD
2 A[User Query] --> B[Task Planner]
3 B --> C[Subtask 1]
4 B --> D[Subtask 2]
5 B --> E[Subtask 3]
6 C --> F[RAG Executor 1]
7 D --> G[RAG Executor 2]
8 E --> H[RAG Executor 3]
9 F --> I[Result Aggregator]
10 G --> I
11 H --> I
12 I --> J[Final Response]
13
In this pattern, complex queries are broken down into simpler subtasks, each handled by a specialized RAG component. The task planner determines which subtasks are needed, while the result aggregator combines the outputs.
1graph TD
2 A[User Query] --> B[Agent Controller]
3 B --> C{Information Sufficient?}
4 C -->|No| D[RAG System]
5 D --> E[Working Memory]
6 E --> C
7 C -->|Yes| F[Action Selection]
8 F --> G[Tool Execution]
9 G --> H[Response Generator]
10 H --> I[Final Response]
11 G --> E
12
In this pattern, the agent repeatedly queries its RAG system until it has sufficient information to take action, then selects and executes an appropriate tool.
1graph TD
2 A[User Query] --> B[Initial RAG]
3 B --> C[Reasoning Step 1]
4 C --> D[RAG for Step 1]
5 D --> E[Reasoning Step 2]
6 E --> F[RAG for Step 2]
7 F --> G[Reasoning Step 3]
8 G --> H[Final Synthesis]
9 H --> I[Response]
10
This pattern interleaves retrieval and reasoning steps, allowing the agent to progressively refine its understanding through a sequence of retrievals guided by interim reasoning.
Let's explore a practical implementation of an Agentic RAG system designed to assist with research tasks:
1class ResearchAssistantAgent:
2 def __init__(self, vector_db, llm):
3 # Initialize core components
4 self.retriever = KnowledgeRetriever(vector_db)
5 self.decision_engine = DecisionEngine(llm, self._register_tools())
6 self.memory = AgentMemory()
7 self.llm = llm
8
9 def _register_tools(self):
10 """Register available tools for the research assistant"""
11 tools = ToolRegistry()
12
13 # Register search refinement tool
14 tools.register_tool(
15 name="refine_search",
16 function=self._refine_search,
17 description="Generate a more specific search query based on initial results",
18 parameters={"original_query": str, "search_results": list, "focus_area": str}
19 )
20
21 # Register web search tool
22 tools.register_tool(
23 name="web_search",
24 function=self._web_search,
25 description="Search the web for recent information",
26 parameters={"query": str, "num_results": int}
27 )
28
29 # Register paper analysis tool
30 tools.register_tool(
31 name="analyze_paper",
32 function=self._analyze_paper,
33 description="Extract key information from a research paper",
34 parameters={"paper_text": str, "aspects": list}
35 )
36
37 # Register citation graph tool
38 tools.register_tool(
39 name="explore_citations",
40 function=self._explore_citations,
41 description="Find papers that cite or are cited by a given paper",
42 parameters={"paper_id": str, "direction": str, "limit": int}
43 )
44
45 return tools
46
47 async def process_query(self, user_query):
48 """Process a research query from the user"""
49 # Initialize task in working memory
50 self.memory.update_working_memory("current_task", user_query)
51 self.memory.update_working_memory("stage", "initial_retrieval")
52
53 # Initial information retrieval
54 initial_docs = self.retriever.retrieve(user_query, "initial research", n=5)
55 self.memory.update_working_memory("retrieved_documents", initial_docs)
56
57 # Decision phase - evaluate if we have enough information
58 decision = self.decision_engine.evaluate_information(
59 initial_docs,
60 user_query,
61 self.memory.working_memory
62 )
63
64 # Execute multi-step research plan
65 response = await self._execute_research_plan(decision, user_query)
66
67 # Record completed task
68 self.memory.add_episode(
69 action="complete_research",
70 observation=user_query,
71 result=response
72 )
73
74 return response
75
76 async def _execute_research_plan(self, initial_plan, user_query):
77 """Execute a multi-step research plan"""
78 # Implementation of plan execution
79 plan_steps = initial_plan.get("steps", [])
80 results = {}
81
82 for step in plan_steps:
83 # Update current stage in working memory
84 self.memory.update_working_memory("stage", step["name"])
85
86 # Execute tool specified by the plan
87 tool_result = await self.decision_engine.select_action(
88 self.memory.working_memory,
89 results
90 )
91
92 # Store result
93 results[step["name"]] = tool_result
94
95 # Record step in episodic memory
96 self.memory.add_episode(
97 action=step["name"],
98 observation=step.get("input", {}),
99 result=tool_result
100 )
101
102 # Generate final response using accumulated results
103 synthesis_prompt = self._create_synthesis_prompt(user_query, results)
104 final_response = self.llm.generate(synthesis_prompt)
105
106 return final_response
107
108 # Tool implementations
109 def _refine_search(self, original_query, search_results, focus_area):
110 # Implementation of search refinement
111 pass
112
113 def _web_search(self, query, num_results=5):
114 # Implementation of web search
115 pass
116
117 def _analyze_paper(self, paper_text, aspects):
118 # Implementation of paper analysis
119 pass
120
121 def _explore_citations(self, paper_id, direction="citing", limit=10):
122 # Implementation of citation exploration
123 pass
124
125 def _create_synthesis_prompt(self, query, results):
126 # Create prompt for final response synthesis
127 pass
128
This research assistant demonstrates several key aspects of Agentic RAG:
The transition from traditional RAG to Agentic RAG yields significant performance improvements across various metrics:
| Task Type | Performance Metric | Traditional RAG | Agentic RAG | Improvement | |-----------|-------------------|-----------------|-------------|-------------| | Fact-finding | Accuracy | 78% | 86% | +8% | | Research tasks | Comprehensiveness | 64% | 92% | +28% | | Decision-making | Decision quality | 59% | 85% | +26% | | Multi-step tasks | Completion rate | 47% | 93% | +46% | | Time-sensitive queries | Currency of information | 72% | 94% | +22% |
The most dramatic improvements are seen in tasks requiring:
Despite its advantages, Agentic RAG systems face several challenges:
Agentic RAG systems typically require multiple LLM calls for a single user query, significantly increasing computational costs. For example, a complex research query might involve:
This can result in 7-10 LLM calls per user query, compared to 1-2 for traditional RAG.
The multi-step nature of Agentic RAG creates opportunities for error propagation and amplification. An error in early stages (e.g., misinterpreting the user's intent) can lead to completely incorrect results, even if subsequent steps execute perfectly.
Each tool in an Agentic RAG system must be carefully integrated and maintained:
The complexity of Agentic RAG makes evaluation significantly more difficult:
Based on early implementations and research, several best practices have emerged:
Rather than overwhelming the agent with all possible tools, implement a hierarchy of tool access:
1def get_available_tools(self, task_state):
2 """Get tools available in the current task state"""
3 task_type = task_state.get("task_type")
4 stage = task_state.get("stage")
5
6 # Base tools available in all contexts
7 available_tools = ["search", "ask_user", "summarize"]
8
9 # Add task-specific tools
10 if task_type == "research":
11 available_tools.extend(["citation_lookup", "paper_analysis"])
12 elif task_type == "data_analysis":
13 available_tools.extend(["data_visualization", "statistical_test"])
14
15 # Add stage-specific tools
16 if stage == "initial_exploration":
17 available_tools.extend(["broad_search", "topic_clustering"])
18 elif stage == "deep_dive":
19 available_tools.extend(["detailed_extraction", "fact_verification"])
20 elif stage == "synthesis":
21 available_tools.extend(["draft_report", "create_visualization"])
22
23 return available_tools
24
This approach reduces decision complexity and makes tool selection more manageable.
Enforce consistent output formats for all tools to simplify integration:
1def execute_tool(self, tool_name, parameters):
2 """Execute a tool and ensure output follows standard format"""
3 try:
4 # Call the actual tool function
5 raw_result = self.tools[tool_name]["function"](**parameters)
6
7 # Ensure result follows standard format
8 standardized_result = {
9 "status": "success",
10 "tool_name": tool_name,
11 "timestamp": time.time(),
12 "result": raw_result,
13 "metadata": {
14 "execution_time": time.time() - start_time,
15 "parameters_used": parameters
16 }
17 }
18
19 return standardized_result
20 except Exception as e:
21 # Handle errors consistently
22 return {
23 "status": "error",
24 "tool_name": tool_name,
25 "timestamp": time.time(),
26 "error": str(e),
27 "metadata": {
28 "error_type": type(e).__name__,
29 "parameters_attempted": parameters
30 }
31 }
32
Encourage the agent to document its reasoning explicitly:
1def make_decision(self, options, context):
2 """Make a decision with explicit reasoning steps"""
3 prompt = f"""
4 Current context: {context}
5
6 Available options: {options}
7
8 Please think through this decision step by step:
9 1. What are the key factors to consider?
10 2. What are the potential consequences of each option?
11 3. What additional information would be helpful but is missing?
12 4. Which option best addresses the current needs?
13
14 Reasoning:
15 """
16
17 # Get the agent's reasoning
18 reasoning = self.llm.generate(prompt)
19
20 # Extract the final decision
21 decision_prompt = f"""
22 Based on this reasoning:
23
24 {reasoning}
25
26 What is your final decision? Choose one of: {", ".join(options)}
27
28 Decision:
29 """
30
31 decision = self.llm.generate(decision_prompt)
32
33 # Store both reasoning and decision
34 self.memory.update_working_memory("last_reasoning", reasoning)
35
36 return decision.strip()
37
This approach makes the decision process more transparent and easier to debug.
Design systems to gracefully fall back to human assistance when necessary:
1def execute_with_human_fallback(self, action, parameters, confidence_threshold=0.8):
2 """Execute an action with fallback to human assistance if confidence is low"""
3 # Assess confidence in this action
4 confidence = self._assess_confidence(action, parameters)
5
6 if confidence >= confidence_threshold:
7 # Proceed with automated execution
8 return self.execute_action(action, parameters)
9 else:
10 # Fall back to human assistance
11 human_response = self._request_human_assistance(
12 action=action,
13 parameters=parameters,
14 confidence=confidence,
15 context=self.memory.working_memory
16 )
17
18 # Record human intervention
19 self.memory.add_episode(
20 action="human_intervention",
21 observation={"action": action, "parameters": parameters},
22 result=human_response
23 )
24
25 return human_response
26
The field of Agentic RAG is evolving rapidly, with several promising directions:
Agents that can reflect on their performance and optimize their own behavior:
1def reflect_on_performance(self, task_record):
2 """Analyze past performance and identify improvement opportunities"""
3 # Construct reflection prompt
4 reflection_prompt = f"""
5 Task: {task_record['task']}
6 Actions taken: {task_record['actions']}
7 Final outcome: {task_record['outcome']}
8 User feedback: {task_record['feedback']}
9
10 Please analyze this task execution and identify:
11 1. What went well?
12 2. What could have been improved?
13 3. Are there specific patterns or strategies that should be adjusted?
14 4. What concrete changes would improve performance on similar tasks?
15 """
16
17 # Generate reflection
18 reflection = self.llm.generate(reflection_prompt)
19
20 # Extract actionable insights
21 insights = self._parse_reflection_insights(reflection)
22
23 # Update agent behavior based on insights
24 for insight in insights:
25 if insight['type'] == 'prompt_improvement':
26 self._update_prompt_template(
27 insight['target_prompt'],
28 insight['suggested_change']
29 )
30 elif insight['type'] == 'tool_selection':
31 self._update_tool_selection_policy(
32 insight['context'],
33 insight['preferred_tool']
34 )
35 elif insight['type'] == 'retrieval_strategy':
36 self._update_retrieval_parameters(
37 insight['parameter'],
38 insight['new_value']
39 )
40
Moving beyond simple vector retrieval to structured knowledge representations:
1class FederatedKnowledgeGraph:
2 def __init__(self, sources):
3 self.sources = sources # List of knowledge sources
4 self.relation_extractor = RelationExtractor()
5 self.reasoner = GraphReasoner()
6
7 def query(self, question, context):
8 """Query the federated knowledge graph"""
9 # Convert question to graph pattern
10 query_pattern = self._question_to_graph_pattern(question)
11
12 # Query each knowledge source
13 partial_results = []
14 for source in self.sources:
15 source_results = source.query(query_pattern)
16 partial_results.append(source_results)
17
18 # Merge and resolve conflicts
19 merged_results = self._merge_results(partial_results)
20
21 # Perform reasoning to infer additional information
22 enriched_results = self.reasoner.infer(merged_results, context)
23
24 return enriched_results
25
26 def _question_to_graph_pattern(self, question):
27 # Convert natural language question to graph query pattern
28 pass
29
30 def _merge_results(self, partial_results):
31 # Merge results from multiple sources, resolving conflicts
32 pass
33
Systems where multiple specialized agents collaborate:
1class AgentCollective:
2 def __init__(self, agents, coordinator):
3 self.agents = agents # Dictionary of specialized agents
4 self.coordinator = coordinator # Agent that manages collaboration
5 self.shared_memory = SharedMemory()
6
7 async def solve_task(self, task):
8 """Solve a complex task using multiple specialized agents"""
9 # Initial task decomposition
10 subtasks = await self.coordinator.decompose_task(task)
11
12 # Assign subtasks to appropriate agents
13 assignments = self.coordinator.assign_subtasks(subtasks, self.agents)
14
15 # Execute subtasks in appropriate order
16 results = {}
17 for subtask_id, assignment in assignments.items():
18 # Execute prerequisite subtasks first
19 prerequisites = assignment.get("prerequisites", [])
20 await self._ensure_prerequisites_completed(prerequisites, results)
21
22 # Execute the subtask with the assigned agent
23 agent = self.agents[assignment["agent"]]
24 subtask_result = await agent.execute_task(
25 subtask=assignment["subtask"],
26 shared_context=self.shared_memory,
27 previous_results=results
28 )
29
30 # Store result
31 results[subtask_id] = subtask_result
32
33 # Update shared memory
34 self.shared_memory.update(subtask_id, subtask_result)
35
36 # Final synthesis by the coordinator
37 final_result = await self.coordinator.synthesize_results(
38 task=task,
39 subtask_results=results,
40 shared_memory=self.shared_memory
41 )
42
43 return final_result
44
45 async def _ensure_prerequisites_completed(self, prerequisites, results):
46 # Wait for prerequisite tasks to complete
47 pass
48
Agentic RAG represents a significant evolution in AI systems, moving beyond passive information retrieval to active decision-making based on retrieved knowledge. By combining the factual grounding of RAG with the autonomy of agent-based systems, Agentic RAG enables a new generation of AI applications that can research, reason, and act with greater independence and effectiveness.
As the field continues to develop, we can expect to see increasingly sophisticated systems that can handle complex, multi-step tasks with minimal human intervention. The integration of structured knowledge representations, self-improvement capabilities, and collective intelligence approaches will further enhance these systems' capabilities, opening new possibilities for AI applications across domains.
For organizations looking to deploy AI solutions, Agentic RAG offers a powerful framework that combines the reliability and factual accuracy of RAG systems with the flexibility and autonomy of agent-based approaches—creating systems that don't just know things, but know what to do with that knowledge.