Most tutorials on AI agents start with LangChain. This one doesn't.
Understanding how agents work at the bare-metal level makes you a dramatically better developer when you eventually reach for a framework. You'll know what it's actually doing, when it's getting in your way, and how to debug it when things go wrong.
Let's build a working AI agent from scratch — no LangChain, no LlamaIndex, just Python's httpx library and a raw API call.
What the ReAct Pattern Actually Is
Before writing code, you need to understand ReAct — the pattern underlying virtually every production AI agent.
ReAct stands for Reasoning + Acting. The loop looks like this:
- Thought: The model reasons about what to do next
- Action: The model decides which tool to call and with what input
- Observation: You run the tool and feed the result back
- Repeat until the model declares it's done
That's it. The magic isn't some complicated algorithm — it's the model deciding what to do next based on what it has observed so far. You're engineering the prompt that enables this loop.
Here's a concrete example. Goal: "What is the population of Tokyo, and how does it compare to New York City?"
Without ReAct, a chatbot guesses or refuses. With ReAct:
- Thought: I need to look up Tokyo's current population.
- Action:
web_search("Tokyo population 2026") - Observation: "Tokyo has approximately 13.96 million people in the city proper."
- Thought: Now I need New York's population.
- Action:
web_search("New York City population 2026") - Observation: "New York City has approximately 8.3 million people."
- Thought: I have both numbers. Tokyo is about 68% larger. I can answer now.
- Final Answer: Tokyo (13.96M) is significantly larger than New York City (8.3M).
The agent's behavior is grounded in actual data, not model weights. This is the fundamental shift that makes agents useful for real-world work.
Setting Up the Project
You need Python 3.11+ and one external library:
pip install httpx python-dotenv
Create a .env file:
OPENAI_API_KEY=your_key_here
We'll use the OpenAI API, but the same code works with any OpenAI-compatible endpoint (Groq, Together, local Ollama, Anthropic via an adapter).
Step 1: Define Your Tools
A tool is just a Python function with a clear name and docstring. The LLM reads the description and decides when to call it.
import json
import re
import os
import inspect
from typing import Callable
import httpx
from dotenv import load_dotenv
load_dotenv()
# Tool registry — maps tool names to functions
TOOLS: dict[str, Callable] = {}
def tool(fn: Callable) -> Callable:
"""Decorator to register a function as an agent tool."""
TOOLS[fn.__name__] = fn
return fn
@tool
def calculate(expression: str) -> str:
"""
Evaluate a mathematical expression safely.
Input: a Python arithmetic expression like '(42 * 3.14) / 100'
Returns: the numeric result as a string.
"""
try:
result = eval(expression, {"__builtins__": {}}, {})
return str(result)
except Exception as e:
return f"Error: {e}"
@tool
def get_current_time() -> str:
"""
Returns the current date and time in ISO 8601 format.
Use this when the user asks about the current time or date.
"""
from datetime import datetime, timezone
return datetime.now(timezone.utc).isoformat()
@tool
def count_words(text: str) -> str:
"""
Count the number of words in a piece of text.
Input: any string of text.
Returns: word count as a string.
"""
return str(len(text.split()))
These are toy tools. A real agent would have web search, code execution, database queries. But the pattern is identical regardless.
Step 2: Build Tool Descriptions Automatically
The LLM needs to know what tools it has available. Generate this from function signatures and docstrings — no manual maintenance:
def get_tool_descriptions() -> str:
lines: list[str] = []
for name, fn in TOOLS.items():
doc = fn.__doc__ or "No description."
sig = inspect.signature(fn)
params = list(sig.parameters.keys())
param_str = ", ".join(params) if params else "no parameters"
lines.append(f"- {name}({param_str}): {doc.strip()}")
return "\n".join(lines)
Step 3: The System Prompt — This Is Where ReAct Lives
Your system prompt teaches the LLM the format it must follow. Be exact:
SYSTEM_PROMPT = """You are a helpful AI assistant with access to tools.
To use a tool, respond with EXACTLY this format — no extra text:
Thought: [your reasoning about what to do]
Action: [tool_name]
Action Input: [input as a JSON string]
When you have a final answer, respond with:
Thought: [your reasoning]
Final Answer: [your answer to the user]
Available tools:
{tool_descriptions}
Rules:
- Use tools when you need real data (time, calculations, etc.)
- Never fabricate tool results — always call the tool
- After an observation, continue your reasoning
- Once you have everything you need, write Final Answer
"""
Step 4: The Agent Loop
This is the core. It runs the model, parses its response, calls tools, and feeds results back:
def call_llm(messages: list[dict]) -> str:
response = httpx.post(
"https://api.openai.com/v1/chat/completions",
headers={"Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}"},
json={
"model": "gpt-4o-mini",
"messages": messages,
"temperature": 0.1,
"max_tokens": 1024,
},
timeout=30,
)
response.raise_for_status()
return response.json()["choices"][0]["message"]["content"]
def parse_action(response: str) -> tuple[str, str] | None:
action_match = re.search(r"Action:\s*(\w+)", response)
input_match = re.search(r"Action Input:\s*(.+?)(?:\n|$)", response, re.DOTALL)
if action_match and input_match:
return action_match.group(1).strip(), input_match.group(1).strip()
return None
def run_agent(goal: str, max_steps: int = 10) -> str:
messages: list[dict] = [
{
"role": "system",
"content": SYSTEM_PROMPT.format(
tool_descriptions=get_tool_descriptions()
),
},
{"role": "user", "content": goal},
]
for step in range(max_steps):
print(f"\n--- Step {step + 1} ---")
response = call_llm(messages)
print(response)
if "Final Answer:" in response:
final = re.search(r"Final Answer:\s*(.+)", response, re.DOTALL)
return final.group(1).strip() if final else response
action = parse_action(response)
if not action:
return response # LLM didn't follow format — treat as answer
tool_name, tool_input_raw = action
messages.append({"role": "assistant", "content": response})
# Execute the tool
if tool_name not in TOOLS:
observation = f"Error: Tool '{tool_name}' not found. Available: {list(TOOLS.keys())}"
else:
try:
parsed = json.loads(tool_input_raw)
if isinstance(parsed, dict):
observation = TOOLS[tool_name](**parsed)
else:
observation = TOOLS[tool_name](parsed)
except json.JSONDecodeError:
observation = TOOLS[tool_name](tool_input_raw)
except Exception as e:
observation = f"Tool error: {e}"
print(f"\nObservation: {observation}")
messages.append({"role": "user", "content": f"Observation: {observation}"})
return "Agent reached maximum steps without a final answer."
Step 5: Run It
if __name__ == "__main__":
result = run_agent(
"What is the current time? Also, calculate what 17% of 4,320 is."
)
print(f"\n=== Final Answer ===\n{result}")
Sample output:
--- Step 1 ---
Thought: The user wants the current time and a calculation. I'll start with the time.
Action: get_current_time
Action Input: ""
Observation: 2026-03-23T14:22:11+00:00
--- Step 2 ---
Thought: I have the time. Now I need to calculate 17% of 4320.
Action: calculate
Action Input: "4320 * 0.17"
Observation: 734.4
--- Step 3 ---
Thought: I have both answers. I can respond now.
Final Answer: The current time is 2:22 PM UTC on March 23, 2026. 17% of 4,320 is **734.4**.
The agent broke the problem into two separate tool calls, executed them in sequence, and synthesized a clean final answer. This is the core loop that powers production AI systems worth billions of dollars.
What to Add Next
This 80-line agent is production-quality architecture. For real use cases, you'd add:
Conversation memory — store past exchanges so the agent can refer back to earlier parts of a conversation. A sliding window (last N messages) keeps you within context limits.
Structured tool schemas — pass tool definitions as JSON Schema to the model and use function calling. More reliable than regex parsing, eliminates the Action Input: format entirely.
Streaming — use SSE (Server-Sent Events) to display tokens as they arrive. The httpx library handles this natively with client.stream().
Error recovery — if a tool fails three times in a row, escalate to the user rather than looping forever. Add a max_retries_per_tool counter.
The Difference Between This and a Framework
LangChain, LlamaIndex, and similar frameworks do exactly what we just built — plus hundreds of pre-built integrations (web search, databases, vector stores). Understanding this bare-metal version means you'll know precisely what the framework is doing, when to extend it, and how to debug it when the abstraction leaks.
If you want to go deeper — multi-step planning, memory systems, multi-agent collaboration — Phase 3: AI Agents in the Agentic AI course at MindloomHQ covers all of it with 12 structured lessons and hands-on projects.