2022 ICLR 2023

ReAct: Synergizing Reasoning and Acting in Language Models

Yao, Zhao, Yu, Du, Shafran, Narasimhan, Cao

TL;DR
Interleave reasoning traces ("Thought") with tool calls ("Action") and their results ("Observation") in a single prompt loop. The template every agent framework still uses.

What it says

ReAct proposes a prompting pattern where the model alternates between reasoning steps and tool actions. Each loop iteration looks like: Thought (free-form reasoning) → Action (call a tool with arguments) → Observation (the tool’s result). The model feeds the observation back into its next thought. On HotpotQA and ALFWorld, ReAct beats both pure chain-of-thought (which can’t check facts) and pure acting (which can’t plan) by combining the strengths of each.

Why it matters

ReAct is the template that shaped every major agent framework: LangChain agents, AutoGPT, and the tool-use protocols inside modern chat APIs all trace back to this paper. The Thought/Action/Observation loop is now so standard that most function-calling APIs are essentially ReAct with structured JSON for the action.

  • Toolformer (Schick et al, 2023) — self-supervised tool-use training instead of prompting.
  • Reflexion (Shinn et al, 2023) — add a self-critique loop on top of ReAct.
  • Chain-of-Thought (Wei et al, 2022) — the reasoning half of the ReAct pattern.