Prompt engineering has a reputation problem. Some people treat it as a professional discipline with frameworks and certifications. Others dismiss it as "just talking to the AI." Both views miss what it actually is: a set of techniques for reliably extracting useful behavior from language models.
This guide covers the techniques that matter in 2026, how to choose between them, and where prompting stops being the right tool.
The Baseline: Zero-Shot Prompting
Zero-shot prompting means giving the model a task with no examples. Just the instruction.
Extract the company name, role, and salary from this job posting:
[job posting text]
This works well for tasks within the model's training distribution — summarization, classification, entity extraction, basic reformatting. If the task is common enough that the model has seen thousands of examples during training, you often do not need anything more sophisticated.
Zero-shot breaks down when the task is unusual, requires specific formatting the model does not default to, or demands consistent output structure across thousands of calls.
Few-Shot Prompting: Show, Don't Tell
Few-shot prompting gives the model examples before asking it to complete your task. The model infers the pattern from the examples rather than from your description.
Extract product data as JSON.
Input: "AirPods Pro 2nd Gen - $249 - Currently unavailable"
Output: {"name": "AirPods Pro 2nd Gen", "price": 249, "in_stock": false}
Input: "Sony WH-1000XM6 - $349.99 - In Stock"
Output: {"name": "Sony WH-1000XM6", "price": 349.99, "in_stock": true}
Input: "Bose QuietComfort Ultra - $429 - Ships in 2 weeks"
Output:
Three things matter for good few-shot examples: they should cover the edge cases you care about (not just the easy path), the format should exactly match what you want back, and examples should be diverse rather than all similar.
Two to five examples covers most cases. More than eight usually does not help and eats context window.
Chain-of-Thought: Making the Model Reason
Chain-of-thought (CoT) prompting asks the model to work through its reasoning before giving the final answer. This consistently improves performance on tasks that require multi-step logic.
The simplest version is the phrase "Let's think step by step." A more controlled version specifies the reasoning format:
You are analyzing a customer support ticket. Before classifying it:
1. Identify the core issue the customer is describing
2. Note any urgency signals (words like "urgent", "ASAP", "broken")
3. Check if this is a billing, technical, or account issue
Then output your classification.
Ticket: [ticket text]
CoT helps because LLMs generate tokens sequentially — working through intermediate steps gives the model "compute" to spend on the problem before committing to an answer. Skipping the reasoning and asking for the final answer directly means the model has less capacity to handle complexity.
The tradeoff is tokens. CoT responses are longer and cost more. For high-volume classification where you are calling the API a million times a day, a simpler prompt at lower cost may be the right engineering decision.
System Prompts: The Foundation Layer
System prompts are the instructions that persist across an entire conversation. They set context, define behavior, establish constraints, and assign the model a role. Done well, they are the most leverage you get per token.
A well-structured system prompt covers:
Identity and role: What is this model supposed to be? A customer support agent for a specific company? A code reviewer for a Python codebase? Make it specific.
What it should and should not do: Explicit constraints beat vague instructions. "Do not discuss competitor products" is clearer than "stay on topic."
Output format: If you want structured responses every time, specify the exact format in the system prompt, not per-request.
Tone and style: Formal or casual? Concise or detailed? If this matters for your use case, say so.
You are a senior code reviewer for a TypeScript codebase. When reviewing code:
- Focus on correctness, then readability, then performance
- Flag any security vulnerabilities immediately as [SECURITY]
- Suggest alternatives, do not just criticize
- Keep feedback under 200 words per file
- Output format: numbered list of issues, then a one-sentence summary
That system prompt will produce more consistent results than any amount of per-request instruction.
Structured Outputs: Reliability at Scale
When you are building applications, you need JSON back — not prose with JSON buried somewhere in the middle. Modern APIs solve this with native structured output modes.
With the Anthropic API, you can define a schema and the model will match it:
import anthropic
import json
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=512,
system="Extract job data from postings. Return valid JSON only.",
messages=[{
"role": "user",
"content": f"Extract from this posting:\n\n{job_posting}"
}]
)
# Response is reliably parseable JSON
data = json.loads(response.content[0].text)
Combine this with explicit schema definition in the system prompt — include the field names, types, and which are optional — and you get reliable structured extraction at scale without fragile regex parsing.
Common Mistakes
Vague instructions. "Write a good summary" is not a prompt. "Write a 3-sentence summary, first sentence covers the main claim, second covers the supporting evidence, third covers the implication" is a prompt.
Front-loading complexity. Long, complex prompts often produce worse results than clear, focused ones. If your prompt requires a paragraph to explain itself, the task may need to be decomposed rather than described more thoroughly.
Ignoring temperature. Temperature 0 for extraction, classification, and structured output. Higher temperature for brainstorming, creative generation, and tasks where diversity matters. Not adjusting temperature for the task is leaving quality on the table.
Not testing systematically. Most prompt engineering happens by vibes. Write down a test set of 20-30 representative inputs, run the prompt against all of them, and measure. This turns prompt iteration from guesswork into engineering.
Over-engineering. A five-part chain-of-thought prompt for a task that works fine zero-shot adds latency and cost with no benefit. Start simple, add complexity only when you measure a problem.
When Prompting Is Enough vs. When You Need Agents
Prompting is enough for most tasks: transformation, extraction, classification, generation with a fixed structure. One input, one output, done.
You need a chain (sequential prompts) when the task has a fixed, known number of steps — extract, then enrich, then format. Each step runs once, in order.
You need an agent when the number of steps is not known in advance, the task requires branching based on what the model discovers during execution, or the system needs to retry and recover from failures at runtime.
Most tasks do not need agents. An agent that runs five LLM calls to do what one good prompt could accomplish is not sophisticated — it is slow and expensive.
The engineering decision is: what is the minimum complexity that reliably achieves the outcome? Start there.
Going Deeper
Prompt engineering is the foundation for everything in production AI. If you want to move from individual prompts to building complete LLM applications — chaining calls, managing context, building evaluation pipelines — Phase 2 of the Agentic AI course at MindloomHQ covers exactly that.
The 13 lessons cover how LLMs actually work under the hood (tokens, attention, context windows), prompt design patterns that scale, structured output strategies, and how to evaluate LLM behavior systematically. Phase 0 and Phase 1 are completely free to start, no credit card required.