Overfitting

Classical ML

When a model memorizes training data instead of learning patterns that generalize to new data.


In one line

When a model memorizes training data instead of learning patterns that generalize to unseen data.

What it actually means

Overfitting shows up as a widening gap between training and validation metrics. Training loss keeps falling while validation loss flattens out or starts climbing. The model has enough capacity to fit noise — outliers, mislabelled examples, quirks of the sampling — and it does. The cure is some combination of more data, less capacity, regularization (dropout, weight decay), early stopping, or data augmentation.

Why it matters

A model that overfits is worse than useless: it lies to you. Your offline metrics look great, you ship, and live performance collapses. Almost every ML failure mode in production is some flavour of overfitting — to the training distribution, to the eval set, to the labelers, to the prompt. Watching the train/val gap is the single most useful habit in classical ML training.

Example

epoch  train_loss  val_loss
  1      1.42       1.45
  5      0.31       0.62   ← gap widening
 10      0.05       0.71   ← clearly overfit

You’ll hear it when

  • Tuning a deep network and the val loss bottoms out way above the train loss.
  • Reviewing a model that “looked great in the notebook” but flopped in A/B test.
  • Choosing regularization or augmentation strategies.
  • Designing a proper held-out test set.
  • Reading about the bias-variance tradeoff.

Related terms