The Python AI ecosystem has consolidated significantly over the past two years. What was a chaotic landscape of competing tools has settled into a recognizable stack. These 7 libraries appear in nearly every serious AI project in 2026 — not because they're fashionable, but because they solve real problems well.
1. openai / anthropic — LLM API Clients
What it does: The official Python SDKs for calling large language model APIs. You send a conversation history, you get a response.
When to use it: Any time you're integrating an LLM into your application. Start here before reaching for a framework — understanding the raw API makes every abstraction above it clearer.
from openai import OpenAI
client = OpenAI(api_key="your-key")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain embeddings in one sentence."}
]
)
print(response.choices[0].message.content)
Alternatives worth knowing: mistralai (Mistral), cohere (Cohere), boto3 for AWS Bedrock, google-generativeai for Gemini. All follow similar patterns — the client/response model is standardized enough that switching providers is usually a find-and-replace.
2. LangChain — Toolkit for LLM Applications
What it does: A framework that provides composable building blocks for LLM applications: chat models, document loaders, text splitters, retrievers, output parsers, and chains that connect them.
When to use it: When you're building something that needs to load documents, split them, embed them, retrieve relevant chunks, and feed them to an LLM — the RAG pattern. LangChain handles the plumbing.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
("system", "Summarize the following text in {language}."),
("user", "{text}")
])
chain = prompt | llm
result = chain.invoke({"text": "Long document here...", "language": "English"})
print(result.content)
Alternatives worth knowing: llama-index (LlamaIndex) — strong for document ingestion and indexing. Haystack — good for enterprise search use cases. For simple use cases, raw API calls are often cleaner than reaching for a framework.
3. LangGraph — Stateful Agent Orchestration
What it does: Builds on LangChain to add stateful, graph-based agent workflows. You define nodes (agent steps) and edges (transitions between them), and LangGraph manages state across the entire run.
When to use it: When your agent needs to branch, loop, or maintain complex state across multiple steps. A simple ReAct agent can run without LangGraph; a multi-agent system with conditional routing almost always benefits from it.
from langgraph.graph import StateGraph, END
from typing import TypedDict
class AgentState(TypedDict):
messages: list
next_step: str
def research_node(state: AgentState) -> AgentState:
# Call LLM or tools here
return {**state, "next_step": "write"}
def write_node(state: AgentState) -> AgentState:
# Generate final output
return {**state, "next_step": END}
graph = StateGraph(AgentState)
graph.add_node("research", research_node)
graph.add_node("write", write_node)
graph.add_edge("research", "write")
graph.set_entry_point("research")
app = graph.compile()
Alternatives worth knowing: crewai — higher-level abstraction for multi-agent crews. AutoGen (Microsoft) — agent framework with a focus on conversational multi-agent patterns. LangGraph gives you more control; frameworks like CrewAI give you faster setup.
4. Pydantic — Structured Outputs and Validation
What it does: Data validation and settings management using Python type annotations. Define a schema as a Python class, pass in data, get back a validated and typed object — or a clear error.
When to use it: Everywhere in AI systems. Validating LLM outputs. Defining agent state schemas. Parsing API responses. Managing configuration. If you're coming from Java, Pydantic is what makes Python feel type-safe.
from pydantic import BaseModel, field_validator
class ExpenseReport(BaseModel):
vendor: str
amount: float
category: str
requires_approval: bool
@field_validator("amount")
def must_be_positive(cls, v):
if v <= 0:
raise ValueError("Amount must be positive")
return v
# Use with LLM structured outputs
from openai import OpenAI
client = OpenAI()
report = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[{"role": "user", "content": "Process: Uber to airport, $47.50, travel"}],
response_format=ExpenseReport,
)
print(report.choices[0].message.parsed)
# ExpenseReport(vendor='Uber', amount=47.5, category='travel', requires_approval=False)
Alternatives worth knowing: dataclasses for simple cases with no validation. attrs — similar to Pydantic but lighter. For LLM output parsing specifically, the instructor library (covered below) wraps Pydantic with additional retry logic.
5. ChromaDB / Qdrant — Vector Databases for RAG
What it does: Stores vector embeddings and retrieves the most semantically similar vectors to a query. The storage layer for RAG (Retrieval-Augmented Generation) systems.
When to use it: Any time your AI system needs to search over a large document set using meaning rather than keywords. "Find all documents relevant to this user question" is a vector search problem.
import chromadb
from chromadb.utils import embedding_functions
client = chromadb.Client()
ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your-key",
model_name="text-embedding-3-small"
)
collection = client.create_collection("docs", embedding_function=ef)
collection.add(
documents=["Python is a programming language.", "Spring Boot is a Java framework."],
ids=["doc1", "doc2"]
)
results = collection.query(
query_texts=["How do I build web services?"],
n_results=1
)
print(results["documents"])
# [['Spring Boot is a Java framework.']]
ChromaDB vs Qdrant: ChromaDB is easier to get running locally (in-memory, no server required). Qdrant is production-grade with better performance at scale, filtering, and payload storage. Start with Chroma; move to Qdrant when you need production robustness.
Alternatives worth knowing: pgvector (Postgres extension) — if you're already running Postgres, this adds vector search without a separate service. Pinecone — fully managed cloud service, no infrastructure to run.
6. FastAPI — Serving AI APIs
What it does: A modern Python web framework for building APIs. Fast to write, automatically generates OpenAPI docs, async-native, and integrates well with Pydantic for request/response validation.
When to use it: Wrapping your AI logic in an HTTP API. If you've built REST APIs in Spring Boot, FastAPI will feel familiar — annotations on functions, request/response typing, dependency injection — just lighter and in Python.
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI
app = FastAPI()
client = OpenAI()
class SummarizeRequest(BaseModel):
text: str
max_sentences: int = 3
class SummarizeResponse(BaseModel):
summary: str
@app.post("/summarize", response_model=SummarizeResponse)
async def summarize(req: SummarizeRequest):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"Summarize in {req.max_sentences} sentences: {req.text}"
}]
)
return SummarizeResponse(summary=response.choices[0].message.content)
Alternatives worth knowing: Flask — simpler, synchronous, good for small services. Django REST Framework — full-featured but heavier. For AI services specifically, FastAPI's async support and Pydantic integration make it the default choice.
7. Instructor — Structured LLM Outputs
What it does: A thin wrapper around LLM clients that makes it reliable to extract structured Pydantic objects from LLM responses. Handles retry logic when the LLM returns malformed output.
When to use it: When you need the LLM to return a structured object and you need it to actually work reliably in production. Raw structured output calls occasionally produce invalid JSON or miss required fields; Instructor retries automatically.
import instructor
from openai import OpenAI
from pydantic import BaseModel
client = instructor.from_openai(OpenAI())
class SupportTicket(BaseModel):
priority: str # "low", "medium", "high", "critical"
category: str
one_line_summary: str
ticket = client.chat.completions.create(
model="gpt-4o-mini",
response_model=SupportTicket,
messages=[{
"role": "user",
"content": "My production database is down and users can't log in."
}]
)
print(ticket.priority) # critical
print(ticket.category) # infrastructure
print(ticket.one_line_summary) # Production database outage preventing user logins
Alternatives worth knowing: LangChain's PydanticOutputParser — similar idea, but less ergonomic. Direct structured output via the response_format parameter — works well with newer models, no retry logic. Instructor is the pragmatic choice when reliability matters.
The Stack in Practice
In a typical production AI system, these libraries compose together:
- API layer: FastAPI receives requests
- Validation: Pydantic validates inputs and outputs
- LLM calls:
openaioranthropicSDK (via Instructor for structured outputs) - Retrieval: ChromaDB or Qdrant for semantic search
- Orchestration: LangGraph for multi-step agent workflows
- Components: LangChain utilities for document loading and splitting
You don't need all 7 for every project. A simple summarization API needs FastAPI, the LLM SDK, and Pydantic. A full agentic RAG system uses the whole stack.
All 7 of these libraries — how they work, when to use them, and how they fit into production AI systems — are covered in depth in the MindloomHQ Agentic AI curriculum. Phases 0 and 1 are completely free, no payment required. Start there →