Skip to content

07.03 — Environment Setup

Overview

This lesson sets up the development environment for the Documentation Helper project — cloning the repository, installing dependencies, creating a Pinecone index, and configuring all required API keys and environment variables.


Step 1: Clone the Repository

git clone <repository-url> -b 1-start-here
cd langchain-course

The -b 1-start-here flag checks out the starting branch with all boilerplate code pre-configured.


Step 2: Create the Pinecone Index

Navigate to pinecone.io and create a new index:

Setting Value Why
Index name langchain-doc-index Descriptive; matches .env variable
Embedding model text-embedding-3-small (OpenAI) Select from the model dropdown
Dimensions 1536 Full dimensionality for maximum semantic information
Metric cosine Standard for text similarity
Capacity Serverless Scales automatically; free tier friendly
Cloud / Region AWS us-east-1 (default) Choose based on latency and compliance

Production Considerations

Factor What to Consider
Cloud provider Choose AWS, GCP, or Azure to match your existing infrastructure
Region Deploy in the same region as your application to minimize latency
GDPR For EU compliance, select a European data center region
Latency Co-locate vector store with your RAG application to avoid cross-region egress costs

Step 3: Configure Environment Variables

Create a .env file (gitignored — never commit secrets):

# LLM + Embeddings
OPENAI_API_KEY=sk-your-key-here

# Vector Store
PINECONE_API_KEY=pcsk-your-key-here
INDEX_NAME=langchain-doc-index

# Tracing (recommended)
LANGSMITH_API_KEY=ls-your-key-here
LANGSMITH_TRACING_V2=true
LANGSMITH_PROJECT=documentation-helper

# Web Crawling
TAVILY_API_KEY=tvly-your-key-here
Variable Purpose
OPENAI_API_KEY Embedding API + LLM inference
PINECONE_API_KEY Vector store authentication
INDEX_NAME Which Pinecone index to use
LANGSMITH_* Tracing every step of the pipeline in LangSmith
TAVILY_API_KEY Web crawling and content extraction

Step 4: Install Dependencies

# Using Pipenv (as shown in this section's videos)
pipenv install

# Using uv (modern alternative — recommended)
uv sync

Key Dependencies

Package Purpose
langchain Core framework — chains, prompts, LCEL
langchain-community Community document loaders
langchain-openai OpenAI embeddings + chat models
langchain-pinecone Pinecone vector store integration
langchain-tavily Tavily crawling integration (TavilyCrawl, TavilyMap, TavilyExtract)
streamlit Frontend chat UI
python-dotenv Load .env file
certifi SSL certificates for HTTP requests

Step 5: Validate the Setup

# ingestion.py (boilerplate)
import os
from dotenv import load_dotenv

load_dotenv()

if __name__ == "__main__":
    print("Ingestion...")
    print(os.environ["PINECONE_API_KEY"][:8] + "...")

Run: python ingestion.py → should print the key prefix without errors.


Project Structure

langchain-course/
├── .env                  ← API keys (gitignored)
├── .gitignore
├── Pipfile / pyproject.toml  ← Dependencies
├── Pipfile.lock / uv.lock    ← Locked versions
├── ingestion.py          ← Ingestion pipeline (crawl → chunk → embed → store)
├── main.py               ← Streamlit frontend
├── logger.py             ← Color-coded logging utilities
├── backend/
│   ├── __init__.py
│   └── core.py           ← Retrieval agent implementation
└── docs/                 ← (created during ingestion)

Summary

Step What We Did
Clone repo Starting branch with boilerplate code
Pinecone index 1536-dim, cosine, serverless
.env file OpenAI, Pinecone, LangSmith, Tavily API keys
Install deps pipenv install or uv sync
Validate Run ingestion.py → confirm env vars load