Chain-of-Thought Prompting / Reasoning Traces

Ubiquitous

Reasoning Strategy

A technique where the model generates intermediate reasoning steps before producing a final answer — converting opaque one-shot inference into a visible, step-by-step thought process that dramatically improves performance on complex tasks.

Chain-of-thought (CoT) can be elicited in three ways: (1) Zero-shot CoT — append 'Let's think step by step' to any prompt (Kojima et al., 2022). Surprisingly effective. (2) Few-shot CoT — provide examples that include reasoning traces in the prompt (Wei et al., 2022). (3) Trained CoT — models fine-tuned on reasoning traces (e.g. OpenAI o1/o3, DeepSeek-R1) that generate 'thinking tokens' automatically. Extensions: Tree-of-Thought (explore multiple reasoning paths, backtrack), Graph-of-Thought (non-linear reasoning), Self-Consistency (sample multiple CoT traces, take majority vote). CoT is the foundation of the 'test-time compute' scaling paradigm — spending more inference tokens to get better answers.

Why Does This Exist?

Raw Capability Scaling →Research Goal

CoT is the single largest capability unlockfor complex reasoning tasks. On GSM8K (math), CoT prompting improved PaLM-540B from 56% to 74%. Trained reasoning models (o1, R1) use CoT as the foundation of test-time compute scaling — spending more inference tokens to achieve qualitatively new capabilities on math, code, and science that the base model cannot achieve at all.

Verifiable AI Reasoning →Research Goal

Explicit reasoning traces create an auditable chain from premises to conclusions. Each step can be independently checked — by a human, a process reward model, or a formal verifier. Without CoT, the model's reasoning is implicit and opaque; with CoT, it is at least inspectable. This is a necessary (though not sufficient) foundation for verifiable AI reasoning.

Epistemic Control →Research Goal

Reasoning traces reveal where a model is confident versus uncertain, where it makes assumptions, and where its logic has gaps. A model that says "I'm not sure about step 3, but assuming X..." provides epistemic metadata that a model which jumps to a final answer cannot. CoT makes the model's epistemic state partially observable, even if not perfectly calibrated.

Mechanistic Interpretability →Research Goal

CoT provides a high-level behavioral trace that complements low-level circuit analysis. If a model's stated reasoning (CoT) diverges from its actual internal computation (circuits), that discrepancy is itself an interpretability finding — revealing where models confabulate reasoning rather than report it. CoT faithfulness research is a bridge between behavioral and mechanistic interpretability.