The term "AI agent" is used to describe everything from a basic chatbot with a few if-statements to a fully autonomous system that completes multi-step tasks without human involvement. That range makes the term almost useless — unless you know what the actual definition is.
This guide is for developers. It defines what an agent actually is, explains the architectural pattern that makes agents work, walks through the major types, and shows you a minimal working example in Python.
Agent vs. Chatbot: The Core Distinction
A chatbot takes a message and returns a response. The interaction is stateless from the system's perspective: one input, one output, done. Even a sophisticated chatbot with a long conversation history is still fundamentally doing one-shot responses.
An agent is different in three ways:
1. It has a goal, not just a prompt. You give an agent an objective — "research this company and summarize their funding history" — not just a question.
2. It decides what actions to take. The agent determines which tools to use and in what order, based on intermediate results. You do not script the steps.
3. It loops. An agent runs multiple iterations, observing results from previous steps and adjusting its plan. The loop continues until the goal is achieved or the agent determines it cannot proceed.
A chatbot answers a question. An agent completes a task.
The Observe-Think-Act Loop
Every agent implementation — whether raw Python, LangChain, LangGraph, or a custom framework — runs some version of this loop:
while not goal_achieved:
observation = perceive_current_state()
action = llm.reason(goal, history, observation)
result = execute(action)
history.append(result)
if action.is_final_answer:
break
This is the ReAct pattern (Reasoning + Acting), described in a 2022 paper by Yao et al. that is still the foundation of how production agents work in 2026.
Observe: What is the current state? What did the last action return? What does the agent know right now?
Think: The LLM receives the goal, the full history, and the current observation. It decides what to do next — which tool to call, what arguments to pass, or whether the task is complete. This is the reasoning step.
Act: Execute the chosen action. This might call a web search API, query a database, run a calculation, or read a file. The result feeds back into the next observation.
The loop is what makes agents different from standard LLM calls. You are running the LLM in a feedback cycle, not just once.
A Minimal Working Agent in Python
No frameworks. Just the core loop using the Anthropic API.
import json
from anthropic import Anthropic
client = Anthropic()
# --- Tool definitions ---
def search_web(query: str) -> str:
# Replace with a real search API (Tavily, Serper, etc.) in production
return f"Search results for '{query}': Found 3 relevant articles."
def calculate(expression: str) -> str:
try:
# Use ast.literal_eval or a safe math parser in production
result = eval(expression, {"__builtins__": {}})
return str(result)
except Exception as e:
return f"Error: {e}"
TOOLS = {
"search_web": search_web,
"calculate": calculate,
}
TOOL_SCHEMAS = [
{
"name": "search_web",
"description": "Search the web for current information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"],
},
},
{
"name": "calculate",
"description": "Evaluate a mathematical expression and return the result",
"input_schema": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression, e.g. '(42 * 1.15) / 3'"}
},
"required": ["expression"],
},
},
]
# --- Agent loop ---
def run_agent(goal: str, max_steps: int = 8) -> str:
messages = [{"role": "user", "content": goal}]
for step in range(max_steps):
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
system="You are a capable agent. Use tools to complete the user's goal. When the goal is fully achieved, respond with your final answer without calling any tools.",
tools=TOOL_SCHEMAS,
messages=messages,
)
# Append assistant turn to history
messages.append({"role": "assistant", "content": response.content})
# No tool use = agent is done
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
return block.text
return "Task complete."
# Execute tool calls and collect results
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f"[Step {step + 1}] {block.name}({block.input})")
fn = TOOLS.get(block.name)
result = fn(**block.input) if fn else f"Unknown tool: {block.name}"
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
return "Max steps reached without completing the goal."
if __name__ == "__main__":
result = run_agent("What is 15% of 847, and search for recent news about LLM agents?")
print("\nFinal answer:", result)
The LLM decides which tools to call and in what order. You never specify the workflow. That is the agent.
Types of Agents
Not all agents are built the same way. The architecture depends on what the agent needs to do.
Reactive agents respond to the current state without planning ahead. They are fast and predictable but cannot handle tasks that require multi-step reasoning. A ticket-routing agent that classifies and assigns support tickets based on content is reactive. These are appropriate when the action space is small and the task is well-defined.
Deliberative agents plan before acting. They reason about sequences of steps, consider alternatives, and may build an explicit plan before executing it. A coding agent that analyzes a bug report, traces through the relevant code, writes a fix, and runs tests is deliberative. These are slower but handle open-ended tasks much better.
Multi-agent systems use multiple specialized agents that collaborate or divide work. One agent researches, another writes, a third reviews. These are powerful for complex tasks but add significant overhead in coordination and failure handling.
Hierarchical agents have an orchestrator that breaks goals into subtasks and delegates to worker agents. The orchestrator never does leaf-level work — it only coordinates. LangGraph's supervisor pattern is a common implementation.
When to Use an Agent vs. Something Simpler
Agents introduce real complexity: longer latency, harder debugging, unpredictable behavior, and higher cost. Before building one, ask whether something simpler works.
Use a simple LLM call when: the task is a single transformation — summarize, classify, extract, reformat. One prompt, one response, done.
Use a chain (sequential prompts) when: the task has a fixed number of steps that always run in the same order. A content pipeline that extracts, enriches, then formats is a chain — not an agent.
Use an agent when: the number of steps is not known in advance, the task requires branching based on what the agent discovers during execution, or the agent needs to retry and recover from partial failures.
Most tasks are chains, not agents. Use the simplest architecture that gets the job done.
What Makes Agents Hard in Production
Non-determinism. The same goal can take different paths on different runs. Testing is harder because you cannot test a fixed execution trace — you need to evaluate outcomes.
Failure cascading. If step 3 returns bad data, step 5 may silently produce wrong results. Add explicit error handling at every tool boundary, not just around the loop.
Context window limits. Long agent runs accumulate large histories. At some point the history exceeds the context window. You need a strategy to summarize or truncate older turns without losing critical information.
Tool reliability. Agents are only as good as their tools. A search tool that returns stale data, an API that silently times out, a parser that drops records — each degrades agent performance in ways that are hard to detect.
These are solvable problems. The engineers who ship reliable agent systems are the ones who design for production concerns from the start.
Going Deeper
If you want to go from this foundation to building agents that work reliably in production, Phase 3 of the Agentic AI course at MindloomHQ covers exactly that.
The 12 lessons cover memory management (how agents track state across long tasks), tool design patterns, error recovery and retry logic, streaming agent output to frontends, and evaluation strategies for non-deterministic systems. Every lesson has full code implementations — not snippets — and a real project to build.
Phases 0 and 1 are completely free to start, no credit card required.