06.07 — Medium Analyzer: Naive Retrieval¶

Overview¶

This lesson implements the retrieval pipeline — the second half of RAG. We take the user's question, search Pinecone for relevant chunks, inject them into the prompt, and have the LLM generate a grounded answer. We start with a naive, step-by-step implementation (no LCEL) to understand exactly what happens at each stage before moving to the elegant chain-based approach in Lesson 08.

The Goal: Answer Questions About the Blog¶

We ingested a Medium article about vector databases in Lesson 05. Now we want to ask questions about it and get accurate, grounded answers.

Test query: "What is Pinecone in machine learning?"

First: Without RAG (Baseline)¶

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-3.5-turbo")
result = llm.invoke([HumanMessage(content="What is Pinecone in machine learning?")])
print(result.content)

Output: "A pinecone algorithm is a method used in machine learning to search for the best configuration hyperparameters..."

❌ Wrong answer. GPT-3.5 (training data cutoff) doesn't strongly associate "Pinecone" with the vector database. It hallucinates about a "pinecone algorithm." This is exactly the problem RAG solves — grounding the LLM's answer in actual data.

[!NOTE] GPT-4o and later models know about Pinecone because their training data is more recent. But the principle holds: for private/niche/recent data, the LLM needs external context.

Setting Up the Retrieval Components¶

import os
from dotenv import load_dotenv
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore

load_dotenv()

# Initialize components
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
llm = ChatOpenAI(model="gpt-3.5-turbo")

# Connect to existing Pinecone index (already populated by ingestion)
vectorstore = PineconeVectorStore(
    index_name=os.environ["INDEX_NAME"],
    embedding=embeddings
)

# Create a retriever (top 3 most relevant chunks)
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 3}
)

The Retriever Object¶

vectorstore.as_retriever() converts the vector store into a LangChain Retriever — an object with an .invoke() method that: 1. Takes a query string 2. Embeds it using the same embedding model 3. Performs similarity search in Pinecone 4. Returns the top K most relevant Document objects

flowchart LR
    Q["'What is Pinecone?'"]
    R["🔍 Retriever\n(k=3)"]
    D1["📄 Chunk #7\nscore: 0.92"]
    D2["📄 Chunk #12\nscore: 0.88"]
    D3["📄 Chunk #3\nscore: 0.85"]

    Q --> R
    R --> D1
    R --> D2
    R --> D3

    style R fill:#f59e0b,color:#fff

The Prompt Template¶

prompt_template = ChatPromptTemplate.from_template(
    """Answer the question based only on the following context:

{context}

Question: {question}

Provide a detailed answer."""
)

This is the augmented prompt — it has two placeholders: - {context} — where the retrieved chunks will be inserted - {question} — the original user query

The Format Function¶

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

Takes a list of Document objects and concatenates their text content into a single string, separated by double newlines. This formatted string is what gets injected into the {context} placeholder.

The Naive Implementation¶

def retrieve_without_lcel(query: str) -> str:
    """Simple retrieval: manually retrieve, format, and generate."""

    # Step 1: Retrieve relevant documents
    documents = retriever.invoke(query)
    # → List of 3 Documents (the most relevant chunks)

    # Step 2: Format documents into a context string
    context = format_docs(documents)
    # → "Pinecone is a managed vector database...\n\n..."

    # Step 3: Create the augmented prompt
    messages = prompt_template.format_messages(
        context=context,
        question=query
    )
    # → [HumanMessage(content="Answer based on...\n\nContext:...\nQuestion:...")]

    # Step 4: Send to LLM
    response = llm.invoke(messages)

    # Step 5: Return the answer
    return response.content

Step-by-Step Flow¶

sequenceDiagram
    participant App
    participant Retriever as Pinecone Retriever
    participant LLM as GPT-3.5 Turbo

    App->>Retriever: "What is Pinecone in ML?"
    Note over Retriever: 1. Embed query<br/>2. Similarity search<br/>3. Return top 3 chunks
    Retriever-->>App: [Doc1, Doc2, Doc3]

    Note over App: Format docs into string<br/>Populate prompt template

    App->>LLM: Augmented prompt<br/>(context + question)
    LLM-->>App: "Pinecone is a fully managed<br/>cloud-based vector database..."

Running It¶

if __name__ == "__main__":
    query = "What is Pinecone in machine learning?"

    # Without RAG (baseline)
    raw = llm.invoke([HumanMessage(content=query)])
    print(f"Without RAG: {raw.content}")

    # With RAG
    result = retrieve_without_lcel(query)
    print(f"With RAG: {result}")

Output with RAG: "Pinecone is a fully managed cloud-based vector database specifically designed for businesses and organizations looking to build and deploy large-scale machine learning applications..."

✅ Correct answer, grounded in the actual Medium article we ingested.

Examining the Steps in Debug¶

Step 1 — Retrieved Documents¶

documents = retriever.invoke(query)
# len(documents) → 3
# documents[0].page_content → "Pinecone is a managed vector database..."
# documents[0].metadata → {"source": "./mediumblog.txt"}

The retriever returns the 3 most semantically similar chunks. Each is a Document with the chunk text and source metadata.

Step 2 — Formatted Context¶

context = format_docs(documents)
# → "Pinecone is a managed vector database...\n\n
#    Vector embeddings are representations...\n\n
#    The key advantage of Pinecone is..."

A single string with all 3 chunks, separated by double newlines.

Step 3 — Augmented Prompt¶

messages = prompt_template.format_messages(context=context, question=query)
# messages[0].content →
#   "Answer the question based only on the following context:
#    
#    Pinecone is a managed vector database...
#    Vector embeddings are representations...
#    The key advantage of Pinecone is...
#    
#    Question: What is Pinecone in machine learning?
#    
#    Provide a detailed answer."

LangSmith Trace Analysis¶

Because the naive implementation uses individual component calls (not a chain), the LangSmith trace shows separate, disconnected entries:

📊 Trace 1: VectorStoreRetriever
├── Input: "What is Pinecone in machine learning?"
└── Output: [Doc1, Doc2, Doc3]

📊 Trace 2: ChatOpenAI (separate trace!)
├── Input: Augmented prompt
└── Output: "Pinecone is a fully managed..."

This is the main disadvantage of the naive approach — the components aren't linked in a single trace. Debugging requires manually matching trace entries.

Limitations of the Naive Approach¶

Limitation	Problem
No streaming	Must wait for the full response; can't stream tokens to the user
No async	Blocks the thread during retrieval and LLM call
No composability	Can't easily pipe this into other chains or add middleware
Poor tracing	Components appear as separate LangSmith traces
Manual wiring	Every step must be manually connected — error-prone

These limitations are all solved by the LCEL-based approach in Lesson 08.

Summary¶

Step	Code	What Happens
Initialize	`vectorstore.as_retriever(k=3)`	Creates a retriever that returns top 3 chunks
Retrieve	`retriever.invoke(query)`	Embeds query → similarity search → returns Documents
Format	`format_docs(documents)`	Joins chunk texts into one context string
Augment	`prompt_template.format_messages(context, question)`	Inserts context + question into prompt
Generate	`llm.invoke(messages)`	LLM produces answer grounded in context

Key Insight	Detail
RAG works	Same model (GPT-3.5) gives wrong answer without RAG, correct answer with RAG
Naive = clear	Easy to understand what happens at each step
Naive = limited	No streaming, no async, poor tracing, manual wiring
Next step	Lesson 08 rebuilds this with LCEL for a production-ready chain