06.09 — LangChain RAG Documentation: Critical Analysis¶
Overview¶
This lesson steps back from implementation to critically examine LangChain's official RAG documentation. The instructor identifies strengths and weaknesses in LangChain's recommended approaches, explains why the documentation's "agentic RAG" pattern is problematic for production, and highlights which patterns are worth adopting.
LangChain's Two RAG Approaches¶
LangChain's documentation presents two main patterns for building RAG applications. Both have significant tradeoffs:
flowchart TD
subgraph Approach1["Approach 1: Agentic RAG\n(ReAct Agent + Search Tool)"]
A1["Create retrieval tool"]
A2["Attach to ReAct agent"]
A3["Agent decides when\nto search (autonomous)"]
A1 --> A2 --> A3
end
subgraph Approach2["Approach 2: Two-Step Chain\n(Deterministic Pipeline)"]
B1["Always retrieve\nrelevant chunks"]
B2["Always augment prompt"]
B3["Always generate answer"]
B1 --> B2 --> B3
end
style Approach1 fill:#ef4444,color:#fff
style Approach2 fill:#10b981,color:#fff
Approach 1: Agentic RAG (Problematic)¶
LangChain's documentation shows wrapping the similarity search as a tool and giving it to a ReAct agent:
# LangChain docs approach (simplified)
@tool
def retrieve_context(query: str) -> str:
docs = vectorstore.similarity_search(query)
return format_docs(docs)
agent = create_react_agent(
llm,
tools=[retrieve_context],
prompt="You have access to a tool that retrieves context..."
)
Why This Is Problematic for Production¶
| Problem | Explanation |
|---|---|
| LLM decides when to search | The agent might skip searching when it should, or search unnecessarily |
| Two inference calls | One call to decide + generate the tool call, another to produce the answer → double the latency and cost |
| Manipulation risk | A ReAct agent is autonomous — it can be jailbroken to answer off-topic questions, bypass guardrails, or behave unexpectedly |
| Non-deterministic | The same query might trigger different behavior each time — the agent may or may not call the tool |
| Hidden complexity | create_react_agent abstracts away a loop that may change between LangChain versions |
[!WARNING] In production customer-facing applications, you almost never want the LLM to decide whether to search. If you're building a customer support bot that should answer from your knowledge base, you always want to search. Leaving this decision to the agent adds unnecessary risk.
When It Might Be Acceptable¶
The agentic approach has one legitimate use case: when the LLM should handle multiple types of queries, some requiring retrieval and some not (e.g., greetings, follow-ups). But even then, a deterministic router is more reliable than giving the agent full autonomy.
Approach 2: Two-Step Chain (What We Built)¶
The second approach in LangChain's docs matches what we implemented in Lessons 07–08:
# Always search, always augment, always generate
chain = (
RunnablePassthrough.assign(
context=itemgetter("question") | retriever | format_docs
)
| prompt_template
| llm
| StrOutputParser()
)
Advantages¶
| Advantage | Detail |
|---|---|
| Deterministic | Always retrieves, always augments, always generates — no surprises |
| Single inference call | One LLM call (not two) → lower cost and latency |
| Predictable behavior | Same input always produces the same pipeline behavior |
| Full control | You see and control every step |
| Traceable | Clean LangSmith trace with all steps visible |
LangChain's Acknowledged Tradeoffs¶
Even LangChain's documentation acknowledges the tradeoffs:
| Factor | Agentic RAG | Two-Step Chain |
|---|---|---|
| Control | Low — LLM decides | High — always retrieves |
| Inference calls | 2 per query | 1 per query |
| Cost | Higher | Lower |
| Latency | Higher | Lower |
| Flexibility | More — can skip search | Less — always searches |
| Reliability | Lower — non-deterministic | Higher — predictable |
The Real Problem with the Documentation¶
The documentation's Approach 1 uses create_react_agent which:
- Runs in a loop — the agent can make multiple iterations
- Abstracts away behavior — you can't easily see what the agent is doing
- Changes between versions — internal implementation may shift, breaking your app
- Over-abstracts — too much magic for something that should be explicit
flowchart LR
EXPLICIT["✅ Explicit Pipeline\n(You control every step)"]
ABSTRACT["❌ Over-Abstracted Agent\n(Hidden loop, LLM decides)"]
EXPLICIT --> GOOD["Predictable, debuggable,\nproduction-ready"]
ABSTRACT --> BAD["Non-deterministic,\nhard to debug, risky"]
style EXPLICIT fill:#10b981,color:#fff
style ABSTRACT fill:#ef4444,color:#fff
What IS Worth Adopting from the Docs¶
Custom RAG Agent via LangGraph¶
LangChain's documentation includes a "Custom RAG Agent under LangGraph" section — and this one is excellent:
flowchart TD
Q["❓ Query"]
RETRIEVE["🔍 Retrieve"]
CHECK["🔎 Check Relevance\n(is context relevant?)"]
GENERATE["🤖 Generate"]
HALLUCINATE["🔍 Check Hallucinations\n(is answer grounded?)"]
RELEVANT["🔎 Check Answer Relevance\n(does it answer the question?)"]
DONE["✅ Return Answer"]
RETRY["🔄 Retry with\nbetter search"]
Q --> RETRIEVE --> CHECK
CHECK -->|"Relevant"| GENERATE
CHECK -->|"Not relevant"| RETRY
GENERATE --> HALLUCINATE
HALLUCINATE -->|"Grounded"| RELEVANT
HALLUCINATE -->|"Not grounded"| RETRY
RELEVANT -->|"Answers question"| DONE
RELEVANT -->|"Doesn't answer"| RETRY
RETRY --> RETRIEVE
This architecture: - Is based on research papers - Uses explicit nodes and edges (LangGraph) - Has hallucination checking and relevance checking - Is deterministic in its control flow (edges, not agent decisions) - Is covered in depth in the Agentic RAG section (Section 13) of this course
Summary¶
| Pattern | Verdict | Why |
|---|---|---|
| Agentic RAG (ReAct agent + tool) | ⚠️ Avoid for production | Non-deterministic, double cost, manipulation risk, over-abstracted |
| Two-Step Chain (LCEL) | ✅ Use this | Deterministic, single call, full control, traceable |
| Custom RAG Agent (LangGraph) | ✅ Use for advanced cases | Research-backed, explicit nodes, hallucination checks |
| Key Takeaway | Detail |
|---|---|
| Don't let the LLM decide when to search | If your app needs retrieval, always retrieve |
| Explicit > Abstract | Control every step of the pipeline; avoid hidden loops |
| Two inference calls ≠ one | Agentic RAG costs twice as much per query |
| LangGraph is the real answer | For complex RAG with quality checking, use LangGraph (Section 13) |