2020 SIGIR 2020

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Khattab, Zaharia

TL;DR
Keep per-token embeddings for both query and document and score via a sum of max-similarities. Much more expressive than a single-vector dual encoder while still fast enough to index.

What it says

Instead of pooling each passage into one vector (as DPR does), ColBERT keeps every token’s contextual embedding. At scoring time, for each query token it finds the max dot product over document tokens and sums those maxes — the “MaxSim” late-interaction operator. This preserves fine-grained match signals that pooling would destroy, and it’s still amenable to approximate nearest neighbor indexing because document tokens are embedded independently.

Why it matters

ColBERT is the strongest open retrieval paradigm when you care about quality and can pay the extra storage and compute. The 2021 follow-up ColBERTv2 added compression and made it practical at web scale. Modern retrieval research (SPLADE, GTR, ColPali for documents as images) keeps revisiting the dense-vs-late-interaction tradeoff that this paper framed.

  • ColBERTv2 (Santhanam et al, 2021) — compression and denoised supervision.
  • SPLADE (Formal et al, 2021) — learned sparse lexical retrieval as a third option.
  • DPR (Karpukhin et al, 2020) — the single-vector baseline ColBERT beats.