Overfitting
When a model memorizes training data instead of learning patterns that generalize to new data.
In one line
When a model memorizes training data instead of learning patterns that generalize to unseen data.
What it actually means
Overfitting shows up as a widening gap between training and validation metrics. Training loss keeps falling while validation loss flattens out or starts climbing. The model has enough capacity to fit noise — outliers, mislabelled examples, quirks of the sampling — and it does. The cure is some combination of more data, less capacity, regularization (dropout, weight decay), early stopping, or data augmentation.
Why it matters
A model that overfits is worse than useless: it lies to you. Your offline metrics look great, you ship, and live performance collapses. Almost every ML failure mode in production is some flavour of overfitting — to the training distribution, to the eval set, to the labelers, to the prompt. Watching the train/val gap is the single most useful habit in classical ML training.
Example
epoch train_loss val_loss
1 1.42 1.45
5 0.31 0.62 ← gap widening
10 0.05 0.71 ← clearly overfit
You’ll hear it when
- Tuning a deep network and the val loss bottoms out way above the train loss.
- Reviewing a model that “looked great in the notebook” but flopped in A/B test.
- Choosing regularization or augmentation strategies.
- Designing a proper held-out test set.
- Reading about the bias-variance tradeoff.