Prompt Engineering Cheatsheet
The handful of prompting patterns that consistently move quality. One example each, no theory.
Zero-shot
Just ask. No examples. Use when the task is well-known to the model and the format you want is unambiguous.
Classify the sentiment of the following review as positive, negative,
or neutral. Reply with one word only.
Review: "Service was slow but the food was excellent."
When it works: common tasks with clear instructions. When it doesn’t: specialized formats, custom labels, anything where “obvious” is doing a lot of work.
Few-shot
Show 2–8 examples of the task done correctly, then ask for one more. The examples teach format, edge cases, and tone faster than any instruction.
Extract the company name and amount from each sentence.
Input: "Acme paid $1,200 for the Q2 invoice."
Output: {"company": "Acme", "amount": 1200}
Input: "We received €450 from Globex on Tuesday."
Output: {"company": "Globex", "amount": 450}
Input: "Initech transferred 9.5k USD last week."
Output:
Diversity in the examples matters more than count. Pick examples that cover the failure modes you saw with zero-shot.
Chain of Thought (CoT)
Tell the model to think step by step before answering. Helps on arithmetic, logic, multi-step reasoning, and any task where the surface answer hides intermediate work.
Question: A bakery has 24 cupcakes. They sell 1/3 in the morning and
half of what's left in the afternoon. How many cupcakes are left?
Think through this step by step before giving the final answer.
The trick: don’t just ask “explain your reasoning” — ask it to reason before committing. The reasoning has to come first or it’s just a post-hoc justification.
ReAct (Reason + Act)
Interleave Thought, Action, and Observation so the model can reason, call a tool, see the result, and update. The standard recipe for agents that use tools.
You can use the tool: search(query) -> string
Question: What was Apple's revenue in Q1 2026?
Thought: I don't know recent financials, I should search.
Action: search("Apple Q1 2026 revenue")
Observation: "Apple reported $124.3B in Q1 2026."
Thought: I have the answer.
Final Answer: $124.3 billion.
ReAct is what frameworks like LangGraph and the OpenAI tool-use format implement under the hood. The pattern is older than the frameworks.
Self-Consistency
Run CoT several times with non-zero temperature, then majority-vote the final answers. Trades latency and cost for accuracy on tasks with a single correct answer (math, multiple choice, structured extraction).
# Pseudocode
samples = [llm(prompt, temperature=0.7) for _ in range(5)]
answers = [extract_final(s) for s in samples]
final = most_common(answers)
It works because reasoning errors are uncorrelated across samples but the correct answer is a stable attractor. Don’t bother on tasks where “correct” isn’t a single value.
A few real-world rules of thumb
- Specify the format you want. “Reply with a JSON object containing the keys X, Y, Z” beats “give me a structured answer” every time.
- Put instructions first, data last. Long contexts forget the instructions if they’re at the top — reinforce them at the end too.
- Negative examples are powerful. “Do NOT include any explanation” saves many tokens of post-processing.
- Constrain with system prompts, not just user prompts. The system prompt has stronger pull on persona and rules.
- Test with adversarial inputs. Empty strings, very long inputs, mixed languages, prompt injection attempts. Every production prompt should survive these.