2022 · NeurIPS 2022

Training Language Models to Follow Instructions with Human Feedback

Ouyang, Wu, Jiang, et al.

2022 NeurIPS 2022

TL;DR

The InstructGPT recipe — SFT on demonstrations, reward model on human preferences, PPO fine-tuning — turns a raw GPT into a model that follows instructions and refuses harmful requests.

Read paper

BACKLOG · WORK IN PROGRESS

This paper is being written.

The metadata and shape of this page are stable, but the body content isn't ready yet. We'll publish it once it meets the bar of teaching something new with worked examples and real tools.

Back to papers Track progress on GitHub