2020 NeurIPS
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis, Perez, Piktus, Petroni, Karpukhin, Goyal, Küttler, Lewis, Yih, Rocktäschel, Riedel, Kiela
TL;DR
Introduces the term "RAG" and a model that retrieves passages from a Wikipedia index and conditions a seq2seq generator on them. The blueprint for every RAG system that came after.
Why it matters
This is where the term Retrieval-Augmented Generation comes from. The paper marries Dense Passage Retrieval with a BART seq2seq decoder and shows that conditioning generation on retrieved evidence beats pure parametric language models on knowledge-intensive tasks (open-domain QA, fact verification, abstractive QA).
The architecture is now considered obvious — query the corpus, paste the results into the prompt, generate the answer. Every production LLM application that uses your private data is some descendant of this idea.
Key contributions
- RAG-Sequence vs RAG-Token — two ways to combine retrieved documents with generation. Sequence picks one document for the whole output; Token can attend to different documents at each generated token.
- Joint training of the retriever and generator. The retriever (a dense bi-encoder over Wikipedia) is fine-tuned alongside the generator using only the final answer as supervision.
- Empirical wins on Natural Questions, TriviaQA, WebQuestions, and fact verification. Crucially, RAG fixed factual errors that pure parametric models confidently produced.
- A clean separation between what the model knows (parameters) and what it can look up (an external, swappable index). This is the property modern RAG products care about most — you can update the index without retraining the model.
Why it still matters
- The basic loop — embed the query, retrieve top-k, pass to a generator — is unchanged in production RAG today.
- The “swap the index, don’t retrain the model” property is exactly what makes RAG cheaper and more controllable than fine-tuning for fresh or private knowledge.
- Modern improvements (hybrid search, rerankers, query rewriting, iterative retrieval, agentic RAG) are all bolted on top of this same framework.
Follow-up reading
- REALM (2020) — retrieval-augmented pretraining.
- Dense Passage Retrieval (2020) — the dense bi-encoder retriever that RAG depends on.
- ColBERT (2020) — late interaction for more accurate dense retrieval.
- Self-RAG, FLARE, GraphRAG — modern variants that add adaptive retrieval, planning, and structured knowledge.
→ Internal: Retrieval-Augmented Generation (RAG)