2018 · NAACL 2019

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, Chang, Lee, Toutanova

2018 NAACL 2019

TL;DR

Pretrain an encoder-only transformer with masked language modeling and next-sentence prediction, then fine-tune for downstream tasks. Set state of the art on 11 NLP benchmarks.

Read paper

BACKLOG · WORK IN PROGRESS

This paper is being written.

The metadata and shape of this page are stable, but the body content isn't ready yet. We'll publish it once it meets the bar of teaching something new with worked examples and real tools.

Back to papers Track progress on GitHub