13.03 — Boilerplate Setup¶

Overview¶

This lesson covers the project initialization for the Agentic RAG system — setting up the development environment, installing dependencies, configuring API keys, and validating that everything works before writing any domain logic.

This may seem like "boring setup" work, but getting the foundation right is critical. Misconfigured API keys, missing dependencies, or incorrect Python paths will cause cryptic errors later in the project. Taking the time to set things up properly now saves hours of debugging later.

Environment & Dependency Setup¶

1. Initialize the Project with Poetry¶

Poetry is used for dependency management and virtual environment isolation.

# Create project directory
mkdir langgraph-course
cd langgraph-course

# Initialize Poetry environment
poetry init

This generates a pyproject.toml file that will track all project dependencies and their versions.

Why Poetry instead of pip? Poetry provides two key advantages over plain pip: 1. Deterministic builds — the poetry.lock file captures exact dependency versions, so every developer (and every CI/CD run) gets the same environment 2. Virtual environment isolation — Poetry automatically creates and manages a virtual environment for your project, preventing conflicts with other Python projects on your system

2. Install Dependencies¶

poetry add \
  beautifulsoup4 \
  langchain \
  langgraph \
  langchain-hub \
  langchain-community \
  tavily-python \
  chromadb \
  python-dotenv \
  black \
  isort \
  pytest

Dependency Breakdown¶

Package	Purpose
`beautifulsoup4`	HTML parsing — used by LangChain's `WebBaseLoader` to scrape web articles for ingestion
`langchain`	Core LangChain framework — chains, prompts, output parsers
`langgraph`	Graph-based orchestration for multi-step agent workflows
`langchain-hub`	Pull pre-built prompts from the LangChain Hub (e.g., RAG prompts)
`langchain-community`	Community-maintained document loaders, vector stores, and tools
`tavily-python`	Tavily Search API SDK — optimized search for LLM-powered applications
`chromadb`	Open-source vector database — runs locally, stores embeddings on disk
`python-dotenv`	Load environment variables from `.env` files
`black`	Python code formatter
`isort`	Import statement sorter
`pytest`	Testing framework for running unit and integration tests

[!TIP] Using poetry ensures deterministic builds — every team member gets the exact same dependency versions via the poetry.lock file.

Environment Variables¶

Create a .env file in the project root:

# OpenAI
OPENAI_API_KEY=sk-your-openai-api-key

# LangSmith Tracing
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=crag
LANGCHAIN_API_KEY=ls-your-langsmith-api-key

# Tavily Search
TAVILY_API_KEY=tvly-your-tavily-api-key

# Python Path (for module resolution)
PYTHONPATH=.

Let's understand what each of these does:

OPENAI_API_KEY — Authenticates with OpenAI's API. Used for every LLM call (GPT for grading/generation) and every embedding call (for query and document embedding). This is the most critical key.
LANGCHAIN_TRACING_V2=true — Enables LangSmith tracing. Every LLM call, chain invocation, and graph execution will be logged to LangSmith for debugging and observability.
LANGCHAIN_PROJECT=crag — Names the tracing project in LangSmith. All traces will appear under this project name in the dashboard.
LANGCHAIN_API_KEY — Authenticates with LangSmith. Required when tracing is enabled.
TAVILY_API_KEY — Authenticates with the Tavily search engine. Used by the Web Search Node for fallback retrieval.
PYTHONPATH=. — Tells Python to include the current directory in its module search path. This is essential — without it, from graph.nodes import retrieve would fail because Python wouldn't know where graph/ is.

API Key Requirements¶

flowchart LR
    subgraph Required_APIs["Required API Services"]
        A["🔑 OpenAI API"]
        B["🔑 LangSmith API"]
        C["🔑 Tavily API"]
    end

    A -->|"LLM calls, Embeddings"| D["Agentic RAG System"]
    B -->|"Tracing & Debugging"| D
    C -->|"Web Search Fallback"| D

    style A fill:#10b981,color:#fff
    style B fill:#4a9eff,color:#fff
    style C fill:#8b5cf6,color:#fff
    style D fill:#1e293b,color:#fff

Service	How to Get	Used For
OpenAI	platform.openai.com	LLM calls (GPT-3.5/4), embedding generation
LangSmith	smith.langchain.com	Tracing graph execution, debugging chains
Tavily	tavily.com	AI-optimized web search for fallback retrieval

[!WARNING] Never commit your .env file to version control. Add .env to your .gitignore.

Boilerplate Entry Point¶

Create a main.py file to validate the setup:

from dotenv import load_dotenv

load_dotenv()

if __name__ == "__main__":
    print("Hello Advanced RAG!")

Validation¶

Run the entry point to confirm the environment is working:

python main.py
# Expected output: Hello Advanced RAG!

If this runs without import errors or missing key warnings, all dependencies are correctly installed and environment variables are loaded.

IDE Configuration (PyCharm)¶

Open the project directory in PyCharm
PyCharm should auto-detect the Poetry virtual environment
Approve the detected interpreter
Wait for package indexing to complete
Verify in pyproject.toml that all dependency versions are listed

[!NOTE] If using VS Code, configure the Python interpreter to point to the Poetry virtual environment: poetry env info --path will give you the venv path.

Summary Checklist¶

[x] Project directory created
[x] Poetry environment initialized
[x] All dependencies installed
[x] .env file created with API keys
[x] main.py boilerplate runs successfully
[x] IDE configured with correct Python interpreter
[x] LangSmith tracing enabled

[!TIP] GitHub branch reference: 1-start-here