ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Khattab, Zaharia
What it says
Instead of pooling each passage into one vector (as DPR does), ColBERT keeps every token’s contextual embedding. At scoring time, for each query token it finds the max dot product over document tokens and sums those maxes — the “MaxSim” late-interaction operator. This preserves fine-grained match signals that pooling would destroy, and it’s still amenable to approximate nearest neighbor indexing because document tokens are embedded independently.
Why it matters
ColBERT is the strongest open retrieval paradigm when you care about quality and can pay the extra storage and compute. The 2021 follow-up ColBERTv2 added compression and made it practical at web scale. Modern retrieval research (SPLADE, GTR, ColPali for documents as images) keeps revisiting the dense-vs-late-interaction tradeoff that this paper framed.
Read next
- ColBERTv2 (Santhanam et al, 2021) — compression and denoised supervision.
- SPLADE (Formal et al, 2021) — learned sparse lexical retrieval as a third option.
- DPR (Karpukhin et al, 2020) — the single-vector baseline ColBERT beats.