Chain-of-Thought Prompting / Reasoning Traces

Ubiquitous

Reasoning Strategy

A technique where the model generates intermediate reasoning steps before producing a final answer — converting opaque one-shot inference into a visible, step-by-step thought process that dramatically improves performance on complex tasks.

Chain-of-thought (CoT) can be elicited in three ways: (1) Zero-shot CoT — append 'Let's think step by step' to any prompt (Kojima et al., 2022). Surprisingly effective. (2) Few-shot CoT — provide examples that include reasoning traces in the prompt (Wei et al., 2022). (3) Trained CoT — models fine-tuned on reasoning traces (e.g. OpenAI o1/o3, DeepSeek-R1) that generate 'thinking tokens' automatically. Extensions: Tree-of-Thought (explore multiple reasoning paths, backtrack), Graph-of-Thought (non-linear reasoning), Self-Consistency (sample multiple CoT traces, take majority vote). CoT is the foundation of the 'test-time compute' scaling paradigm — spending more inference tokens to get better answers.

Why Does This Exist?

CoT is the single largest capability unlockfor complex reasoning tasks. On GSM8K (math), CoT prompting improved PaLM-540B from 56% to 74%. Trained reasoning models (o1, R1) use CoT as the foundation of test-time compute scaling — spending more inference tokens to achieve qualitatively new capabilities on math, code, and science that the base model cannot achieve at all.

Explicit reasoning traces create an auditable chain from premises to conclusions. Each step can be independently checked — by a human, a process reward model, or a formal verifier. Without CoT, the model's reasoning is implicit and opaque; with CoT, it is at least inspectable. This is a necessary (though not sufficient) foundation for verifiable AI reasoning.

Reasoning traces reveal where a model is confident versus uncertain, where it makes assumptions, and where its logic has gaps. A model that says "I'm not sure about step 3, but assuming X..." provides epistemic metadata that a model which jumps to a final answer cannot. CoT makes the model's epistemic state partially observable, even if not perfectly calibrated.

CoT provides a high-level behavioral trace that complements low-level circuit analysis. If a model's stated reasoning (CoT) diverges from its actual internal computation (circuits), that discrepancy is itself an interpretability finding — revealing where models confabulate reasoning rather than report it. CoT faithfulness research is a bridge between behavioral and mechanistic interpretability.