AI Economy Hub

Chain-of-thought prompt builder

Stack a decomposition, rationale, and self-check so the model reasons step-by-step instead of guessing.

Loading tool…

Get weekly marketing insights

Join 1,200+ readers. One email per week. Unsubscribe anytime.

Frequently asked questions

1.Does 'think step by step' still work?

Weakly. Structured decomposition + rationale + self-check beats the wrapper phrase by 10-25 points on hard tasks. Use the three-part structure, not the magic words.

2.Should I use CoT on every prompt?

No. CoT adds 500-1,500 output tokens per call. On easy tasks it costs 2-5Γ— for zero quality lift. Gate behind a difficulty trigger or user toggle.

3.CoT or a reasoning model (o4)?

Reasoning models win on hard math and proofs by 5-15 points. CoT on Sonnet/GPT-5 wins on medium-hard tasks by hitting 90%+ of reasoning-model quality at 1/5 cost and 1/10 latency.

4.Should I show the chain-of-thought to users?

Usually no β€” it's messy and leaks model reasoning. Ask the model to produce a final-answer block after the reasoning, and render only the final block.

5.Does CoT help coding?

Yes, especially on debugging and multi-file reasoning. Pair with a 'before writing any code, explain your plan' prompt.

Chain-of-thought prompting that actually works

"Think step by step" is a meme at this point. It helps β€” sometimes β€” but it's a blunt tool. The version that reliably lifts pass rate on hard reasoning tasks is a three-part structure: decomposition, rationale, self-check. Stack them in the prompt and the model reasons instead of pattern-matching.

The three-part structure

1. Decomposition

Before asking for an answer, ask for a list. "List the top 5 hypotheses." "List every assumption in the problem." "List the relevant data points from the context." This forces the model to surface structure it would otherwise skip.

2. Rationale per option

Ask for evidence for each item in the list. "For each hypothesis, list the evidence for and against." "Score each 1-5 on plausibility." This is where chain-of-thought actually happens β€” not in a vague "think step by step" wrapper, but in structured per-item reasoning.

3. Self-check

Before the final answer, ask the model to check its own work from a different perspective. "What would a skeptical VP of Product say is missing?" "Name one way this answer could be wrong." Self-check prompts reliably catch 20-40% of errors that would otherwise ship.

When to use CoT vs a reasoning model

For hard math, proofs, or code requiring 10+ steps of logic, skip CoT and use a reasoning model (OpenAI o4, for example). They generate an extended internal chain of thought automatically and land 5-15 points higher on hard benchmarks.

For medium-hard tasks β€” diagnosis, analysis, multi-factor trade-off β€” CoT on a non-reasoning model (Sonnet 4.5 or GPT-5) usually hits 90%+ of the quality of a reasoning model at 1/5 the cost and 1/10 the latency. This is the sweet spot.

Anti-patterns

  • "Think step by step" and nothing else. Vague. Ask for specific structure.
  • Chain-of-thought on simple tasks. You are paying for unnecessary output tokens β€” 2-5Γ— cost for zero quality lift.
  • No final-answer section. Without a bounded final-answer block, the model keeps reasoning indefinitely.
  • Leaking reasoning to users. Chain-of-thought is for the model, not the user. Strip it before display or ask the model to hide it in a reasoning-only section.

Measurement

Score CoT prompts on pass rate, cost, and latency against a baseline on your eval set. If pass rate lifts < 5% but cost 3Γ—, drop it.

Keep going

More free tools