2020 NeurIPS

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis, Perez, Piktus, Petroni, Karpukhin, Goyal, Küttler, Lewis, Yih, Rocktäschel, Riedel, Kiela

TL;DR
Introduces the term "RAG" and a model that retrieves passages from a Wikipedia index and conditions a seq2seq generator on them. The blueprint for every RAG system that came after.

Why it matters

This is where the term Retrieval-Augmented Generation comes from. The paper marries Dense Passage Retrieval with a BART seq2seq decoder and shows that conditioning generation on retrieved evidence beats pure parametric language models on knowledge-intensive tasks (open-domain QA, fact verification, abstractive QA).

The architecture is now considered obvious — query the corpus, paste the results into the prompt, generate the answer. Every production LLM application that uses your private data is some descendant of this idea.

Key contributions

  • RAG-Sequence vs RAG-Token — two ways to combine retrieved documents with generation. Sequence picks one document for the whole output; Token can attend to different documents at each generated token.
  • Joint training of the retriever and generator. The retriever (a dense bi-encoder over Wikipedia) is fine-tuned alongside the generator using only the final answer as supervision.
  • Empirical wins on Natural Questions, TriviaQA, WebQuestions, and fact verification. Crucially, RAG fixed factual errors that pure parametric models confidently produced.
  • A clean separation between what the model knows (parameters) and what it can look up (an external, swappable index). This is the property modern RAG products care about most — you can update the index without retraining the model.

Why it still matters

  • The basic loop — embed the query, retrieve top-k, pass to a generator — is unchanged in production RAG today.
  • The “swap the index, don’t retrain the model” property is exactly what makes RAG cheaper and more controllable than fine-tuning for fresh or private knowledge.
  • Modern improvements (hybrid search, rerankers, query rewriting, iterative retrieval, agentic RAG) are all bolted on top of this same framework.

Follow-up reading

  • REALM (2020) — retrieval-augmented pretraining.
  • Dense Passage Retrieval (2020) — the dense bi-encoder retriever that RAG depends on.
  • ColBERT (2020) — late interaction for more accurate dense retrieval.
  • Self-RAG, FLARE, GraphRAG — modern variants that add adaptive retrieval, planning, and structured knowledge.

→ Internal: Retrieval-Augmented Generation (RAG)