2020 · SIGIR 2020

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Khattab, Zaharia

2020 SIGIR 2020

TL;DR

Keep per-token embeddings for both query and document and score via a sum of max-similarities. Much more expressive than a single-vector dual encoder while still fast enough to index.

Read paper

BACKLOG · WORK IN PROGRESS

This paper is being written.

The metadata and shape of this page are stable, but the body content isn't ready yet. We'll publish it once it meets the bar of teaching something new with worked examples and real tools.

Back to papers Track progress on GitHub