06.10 ā RAG Implementation with Vector Stores: Quiz & Review¶
Overview¶
This quiz covers the complete RAG pipeline ā from ingestion (loading, splitting, embedding, storing) to retrieval (searching, augmenting, generating). Each question tests a core concept from the section.
Question 1: CharacterTextSplitter¶
Q: What is the primary purpose of CharacterTextSplitter with chunk_size=1000 and chunk_overlap=0 in the RAG pipeline?
A: To break large documents into smaller, semantically meaningful chunks for embedding and retrieval.
Deep Explanation:
Documents are typically too large to embed as a single vector ā a 300-page book would produce one massive, unfocused vector that can't be meaningfully compared to short user queries. The text splitter divides the document into smaller chunks (~1000 characters each) so that each chunk represents a focused, searchable unit of information.
flowchart LR
DOC["š Full Document\n(50,000 chars)"]
SPLIT["āļø CharacterTextSplitter\n(chunk_size=1000)"]
CHUNKS["š 50 Chunks\n(~1000 chars each)"]
DOC --> SPLIT --> CHUNKS
With chunk_overlap=0, adjacent chunks don't share content. This is simpler but risks losing information split across chunk boundaries. For production, chunk_overlap=100-200 is often recommended to preserve context continuity.
Question 2: OpenAIEmbeddings¶
Q: In the context of RAG ingestion, what role do OpenAIEmbeddings serve?
A: They convert text chunks into high-dimensional vector representations that capture semantic meaning.
Deep Explanation:
Embeddings are the bridge between text (what humans understand) and vectors (what machines can search). The embedding model transforms each chunk into a list of 1536 numbers such that semantically similar texts produce similar vectors:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# "Pinecone is a vector database" ā [0.12, 0.85, -0.34, ...] (1536 floats)
# "Vector databases store embeddings" ā [0.11, 0.83, -0.31, ...] (similar!)
# "The weather is sunny today" ā [-0.67, 0.21, 0.93, ...] (very different!)
Without embeddings, there's no way to perform semantic search. Keyword search would miss synonyms, paraphrases, and related concepts.
Question 3: PineconeVectorStore.from_documents()¶
Q: What happens when you call PineconeVectorStore.from_documents(docs, embeddings, index_name="rag-index")?
A: It embeds each document chunk using the provided embedding model, then stores both the vectors and their metadata (text + source) in the Pinecone index.
Deep Explanation:
This single call orchestrates the entire storage pipeline:
sequenceDiagram
participant App
participant OpenAI as OpenAI Embeddings API
participant PC as Pinecone Index
loop For each chunk (batched)
App->>OpenAI: "Pinecone is a vector database..."
OpenAI-->>App: [0.12, 0.85, -0.34, ...]
end
App->>PC: Upsert vectors + metadata
PC-->>App: ā
20 vectors stored
LangChain handles batching (to avoid rate limits), threading (for parallel embedding), and error handling behind this simple API.
Question 4: vectorstore.as_retriever()¶
Q: What is the purpose of vectorstore.as_retriever() in the retrieval chain?
A: It converts the vector store into a Retriever interface ā a LangChain Runnable with .invoke() that accepts a query string and returns the most similar document chunks.
Deep Explanation:
A vector store knows how to store and search. A retriever wraps it in the Runnable interface needed for LCEL chains:
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# Now it's a Runnable ā can be used in LCEL pipes:
docs = retriever.invoke("What is Pinecone?") # ā [Doc1, Doc2, Doc3]
The k=3 parameter says: return the 3 most semantically similar chunks. Under the hood, the retriever embeds the query, performs cosine similarity search in Pinecone, and returns the top results.
Question 5: The RAG Prompt¶
Q: How does the augmented prompt combine context and question?
A: The prompt template has two placeholders ā {context} (filled with retrieved chunks) and {question} (filled with the user's query). The LLM is instructed to answer only based on the provided context.
Deep Explanation:
prompt_template = ChatPromptTemplate.from_template("""
Answer the question based only on the following context:
{context}
Question: {question}
Provide a detailed answer.
""")
The phrase "based only on the following context" is critical ā it tells the LLM to ground its answer in the retrieved chunks rather than its training data. This prevents hallucination and ensures the answer comes from your actual documents.
Question 6: Invoking the Retrieval Chain¶
Q: In the retrieval chain, what happens when you invoke with {"question": query}?
A: The query is embedded into a vector, similar document chunks are retrieved from Pinecone, the prompt is augmented with the chunks as context, and the LLM generates a grounded answer.
Deep Explanation (LCEL flow):
flowchart TD
INPUT["{'question': 'What is Pinecone?'}"]
IG["itemgetter('question')\nā 'What is Pinecone?'"]
RET["Retriever: embed + search\nā [Doc1, Doc2, Doc3]"]
FMT["format_docs\nā context string"]
ASSIGN["RunnablePassthrough.assign\nā {'question': '...', 'context': '...'}"]
PROMPT["ChatPromptTemplate\nā augmented prompt"]
LLM["ChatOpenAI\nā AIMessage"]
PARSE["StrOutputParser\nā answer string"]
INPUT --> IG --> RET --> FMT --> ASSIGN
ASSIGN --> PROMPT --> LLM --> PARSE
Question 7: Environment Variables¶
Q: Why is the environment variable INDEX_NAME used instead of hardcoding the Pinecone index name?
A: It allows flexible deployment across different environments (dev, staging, production) without changing code.
Deep Explanation:
# Development
INDEX_NAME=rag-dev-index
# Staging
INDEX_NAME=rag-staging-index
# Production
INDEX_NAME=rag-prod-index
The same application code works in all environments. This is a 12-factor app best practice. For PINECONE_API_KEY specifically, LangChain's Pinecone integration auto-detects this exact variable name ā using a different name would break the auto-detection.