06.09 — LangChain RAG Documentation: Critical Analysis¶

Overview¶

This lesson steps back from implementation to critically examine LangChain's official RAG documentation. The instructor identifies strengths and weaknesses in LangChain's recommended approaches, explains why the documentation's "agentic RAG" pattern is problematic for production, and highlights which patterns are worth adopting.

LangChain's Two RAG Approaches¶

LangChain's documentation presents two main patterns for building RAG applications. Both have significant tradeoffs:

flowchart TD
    subgraph Approach1["Approach 1: Agentic RAG\n(ReAct Agent + Search Tool)"]
        A1["Create retrieval tool"]
        A2["Attach to ReAct agent"]
        A3["Agent decides when\nto search (autonomous)"]
        A1 --> A2 --> A3
    end

    subgraph Approach2["Approach 2: Two-Step Chain\n(Deterministic Pipeline)"]
        B1["Always retrieve\nrelevant chunks"]
        B2["Always augment prompt"]
        B3["Always generate answer"]
        B1 --> B2 --> B3
    end

    style Approach1 fill:#ef4444,color:#fff
    style Approach2 fill:#10b981,color:#fff

Approach 1: Agentic RAG (Problematic)¶

LangChain's documentation shows wrapping the similarity search as a tool and giving it to a ReAct agent:

# LangChain docs approach (simplified)
@tool
def retrieve_context(query: str) -> str:
    docs = vectorstore.similarity_search(query)
    return format_docs(docs)

agent = create_react_agent(
    llm, 
    tools=[retrieve_context],
    prompt="You have access to a tool that retrieves context..."
)

Why This Is Problematic for Production¶

Problem	Explanation
LLM decides when to search	The agent might skip searching when it should, or search unnecessarily
Two inference calls	One call to decide + generate the tool call, another to produce the answer → double the latency and cost
Manipulation risk	A ReAct agent is autonomous — it can be jailbroken to answer off-topic questions, bypass guardrails, or behave unexpectedly
Non-deterministic	The same query might trigger different behavior each time — the agent may or may not call the tool
Hidden complexity	`create_react_agent` abstracts away a loop that may change between LangChain versions

[!WARNING] In production customer-facing applications, you almost never want the LLM to decide whether to search. If you're building a customer support bot that should answer from your knowledge base, you always want to search. Leaving this decision to the agent adds unnecessary risk.

When It Might Be Acceptable¶

The agentic approach has one legitimate use case: when the LLM should handle multiple types of queries, some requiring retrieval and some not (e.g., greetings, follow-ups). But even then, a deterministic router is more reliable than giving the agent full autonomy.

Approach 2: Two-Step Chain (What We Built)¶

The second approach in LangChain's docs matches what we implemented in Lessons 07–08:

# Always search, always augment, always generate
chain = (
    RunnablePassthrough.assign(
        context=itemgetter("question") | retriever | format_docs
    )
    | prompt_template
    | llm
    | StrOutputParser()
)

Advantages¶

Advantage	Detail
Deterministic	Always retrieves, always augments, always generates — no surprises
Single inference call	One LLM call (not two) → lower cost and latency
Predictable behavior	Same input always produces the same pipeline behavior
Full control	You see and control every step
Traceable	Clean LangSmith trace with all steps visible

LangChain's Acknowledged Tradeoffs¶

Even LangChain's documentation acknowledges the tradeoffs:

Factor	Agentic RAG	Two-Step Chain
Control	Low — LLM decides	High — always retrieves
Inference calls	2 per query	1 per query
Cost	Higher	Lower
Latency	Higher	Lower
Flexibility	More — can skip search	Less — always searches
Reliability	Lower — non-deterministic	Higher — predictable

The Real Problem with the Documentation¶

The documentation's Approach 1 uses create_react_agent which:

Runs in a loop — the agent can make multiple iterations
Abstracts away behavior — you can't easily see what the agent is doing
Changes between versions — internal implementation may shift, breaking your app
Over-abstracts — too much magic for something that should be explicit

flowchart LR
    EXPLICIT["✅ Explicit Pipeline\n(You control every step)"]
    ABSTRACT["❌ Over-Abstracted Agent\n(Hidden loop, LLM decides)"]

    EXPLICIT --> GOOD["Predictable, debuggable,\nproduction-ready"]
    ABSTRACT --> BAD["Non-deterministic,\nhard to debug, risky"]

    style EXPLICIT fill:#10b981,color:#fff
    style ABSTRACT fill:#ef4444,color:#fff

What IS Worth Adopting from the Docs¶

Custom RAG Agent via LangGraph¶

LangChain's documentation includes a "Custom RAG Agent under LangGraph" section — and this one is excellent:

flowchart TD
    Q["❓ Query"]
    RETRIEVE["🔍 Retrieve"]
    CHECK["🔎 Check Relevance\n(is context relevant?)"]
    GENERATE["🤖 Generate"]
    HALLUCINATE["🔍 Check Hallucinations\n(is answer grounded?)"]
    RELEVANT["🔎 Check Answer Relevance\n(does it answer the question?)"]
    DONE["✅ Return Answer"]
    RETRY["🔄 Retry with\nbetter search"]

    Q --> RETRIEVE --> CHECK
    CHECK -->|"Relevant"| GENERATE
    CHECK -->|"Not relevant"| RETRY
    GENERATE --> HALLUCINATE
    HALLUCINATE -->|"Grounded"| RELEVANT
    HALLUCINATE -->|"Not grounded"| RETRY
    RELEVANT -->|"Answers question"| DONE
    RELEVANT -->|"Doesn't answer"| RETRY

    RETRY --> RETRIEVE

This architecture: - Is based on research papers - Uses explicit nodes and edges (LangGraph) - Has hallucination checking and relevance checking - Is deterministic in its control flow (edges, not agent decisions) - Is covered in depth in the Agentic RAG section (Section 13) of this course

Summary¶

Pattern	Verdict	Why
Agentic RAG (ReAct agent + tool)	⚠️ Avoid for production	Non-deterministic, double cost, manipulation risk, over-abstracted
Two-Step Chain (LCEL)	✅ Use this	Deterministic, single call, full control, traceable
Custom RAG Agent (LangGraph)	✅ Use for advanced cases	Research-backed, explicit nodes, hallucination checks

Key Takeaway	Detail
Don't let the LLM decide when to search	If your app needs retrieval, always retrieve
Explicit > Abstract	Control every step of the pipeline; avoid hidden loops
Two inference calls ≠ one	Agentic RAG costs twice as much per query
LangGraph is the real answer	For complex RAG with quality checking, use LangGraph (Section 13)