aka Reinforcement Learning from Human Feedback

RLHF

Fine-tuning a language model using a reward model trained on human preference data, with reinforcement learning to optimize for the reward.

LLMs

BACKLOG · WORK IN PROGRESS

This term is being written.

The metadata and shape of this page are stable, but the body content isn't ready yet. We'll publish it once it meets the bar of teaching something new with worked examples and real tools.

Back to wiki Track progress on GitHub