2023 NeurIPS 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Yao, Yu, Zhao, Shafran, Griffiths, Cao, Narasimhan
TL;DR
Search over a tree of partial reasoning paths with lookahead and backtracking, using the LLM to both generate branches and evaluate them. Big wins on puzzles that trip up linear CoT.
What it says
Chain-of-thought produces one linear reasoning trace. Tree-of-thoughts generalizes this to a tree: at each step the model proposes several candidate next “thoughts”, a value function (also an LLM call) scores them, and a search algorithm (BFS or DFS) explores the promising branches with the ability to backtrack. On Game of 24 and mini crosswords, ToT solves problems CoT can’t touch at the cost of many more LLM calls.
Why it matters
ToT reframed LLM inference as a search problem and inspired a wave of work on deliberate reasoning — Graph of Thoughts, Reasoning via Planning, LLM Monte Carlo Tree Search — as well as test-time compute techniques that later showed up in frontier reasoning models.
Read next
- Chain-of-Thought (Wei et al, 2022) — the linear baseline.
- Self-Consistency (Wang et al, 2022) — sample many CoTs and vote.
- Let’s Verify Step by Step (Lightman et al, 2023) — process reward models for step-level scoring.