07.15 — Documentation Helper in Production¶

Overview¶

In this lesson, we look at Chat LangChain, the official, open-source documentation helper built by the LangChain team. This serves as a real-world example of taking our "Documentation Helper" prototype and scaling it into a production-ready application using advanced agentic RAG patterns (specifically, explicit query rewriting and multi-agent coordination via LangGraph).

The Chat LangChain Application¶

If you go to chat.langchain.com, you'll see a UI similar to the Streamlit app we just built. However, its backend architecture is significantly more sophisticated.

1. The Query Expansion Flow¶

If you ask Chat LangChain "What is LangChain?", it doesn't just search the vector store for "What is LangChain?".

Instead, it employs Query Expansion (also known as sub-query generation):

flowchart TD
    Q[User Prompt:\n"What is LangChain?"]

    Q --> A[Subquery 1:\nReview docs and gather\ncomprehensive definition]
    Q --> B[Subquery 2:\nCore components of\nLangChain]
    Q --> C[Subquery 3:\nUse cases for\nLangChain]

    A --> VS[(Vector Store)]
    B --> VS
    C --> VS

    VS -. "Multiple Docs" .-> RANK[Re-rank and Filter Results]
    RANK --> LLM[Final Generation]

    style VS fill:#10b981,color:#fff

Why do this? A single user query is often too vague for effective semantic search. By having an LLM generate 3-5 distinct, specific questions based on the user's prompt, we cast a wider net in the vector store and retrieve higher-quality context.

2. Generative UI (Transparency & Trust)¶

Chat LangChain exposes its internal state directly to the user interface. It shows: - The sub-queries being generated. - The parallel searches hitting the vector store. - The exact documents retrieved before generation begins.

This is the principle of Generative UI introduced in the previous lesson. It builds immense trust by showing the "scratchpad" of the agent's reasoning.

3. Dissecting the Architecture¶

Chat LangChain is fully open-source on GitHub (langchain-ai/chat-langchain). Let's look at its tech stack:

Component	Responsibility	Pattern implementation
LangChain Core	Prompts, Embeddings, Vector Store connections	Basic integration
LangGraph	Multi-Agent Coordination, Flow Control	The core logic engine driving the query routing and aggregation
Next.js + TypeScript	The frontend	Consuming the streaming state of the LangGraph execution

Exploring `prompts.py`¶

If we check the repository (backend/chat_langchain/prompts.py), we find highly specific prompts used throughout the application pipeline:

Router Prompt: Decides if the question requires looking at the codebase or documentation.
Generate Queries Prompt: Takes the user's intent and forces the LLM to write 3 optimal semantic search queries.
Answer Prompt: The final prompt combining all the retrieved context.

Example: Coreference Resolution¶

If you ask: "Who created LangChain?" And then follow up with: "When was it created?"

The system performs Coreference Resolution. It looks at the chat history, understands that "it" refers to LangChain, and automatically rewrites the query to "When was LangChain created?" before performing the similarity search. Without this step, searching a vector store for "When was it created?" would return useless results.

Summary¶

The difference between our prototype and a production app lies in the intermediate steps.

Feature	Prototype (Our App)	Production (Chat LangChain)
Retrieval	Single search based on user query	Query Expansion (Sub-queries), Re-ranking
Context	User prompt directly hits Vector Store	Coreference Resolution (Contextual Rewriting)
Flow Control	Prebuilt Agent (`create_agent`)	Custom LangGraph State Machine
UI	Synchronous Streamlit	Streaming Generative UI (Next.js)
Multi-Agent	No	Yes (Routing between Code vs Docs)

We will learn how to build architectures identical to Chat LangChain natively in the dedicated LangGraph section of the course!