Epoch
One full pass of the training algorithm over the entire training dataset.
In one line
One full pass over the training set — if you have 100k examples and a batch size of 100, one epoch is 1000 optimizer steps.
What it actually means
Training usually happens in mini-batches: you sample a batch, compute the loss, backprop, update. An epoch is when you’ve seen every example once (typically by shuffling at the start and iterating). You then repeat for multiple epochs. The number of epochs is a hyperparameter tied to how fast the model converges and how quickly it overfits. For large pretraining runs you often train for less than one epoch on a huge corpus; for small-dataset fine-tuning you might do 3–10.
Why it matters
Epoch count controls compute budget and overfitting. Early-stopping based on validation loss is the standard way to decide how many epochs are enough. Mixing “epoch” with “step” confuses people — always be explicit about which you mean when reporting training progress.
Example
for epoch in range(num_epochs):
for batch in loader:
loss = model(batch).loss
loss.backward()
optim.step(); optim.zero_grad()
You’ll hear it when
- Reading training logs (
epoch 3/10, train loss 0.42). - Configuring an early-stopping callback.
- Comparing training schedules in papers.
- Estimating how long a run will take.