Loss Function

objective · cost function
Deep Learning

A scalar score that measures how wrong a model's predictions are — the thing the optimizer tries to make smaller.


In one line

A scalar score that measures how wrong a model’s predictions are on a batch — the thing the optimizer tries to make smaller.

What it actually means

The loss takes the model’s output and the ground-truth label and returns a single number that says “you were this far off”. For regression you usually use mean squared error. For classification it’s cross-entropy. For language modelling it’s cross-entropy over the next-token distribution. You can add regularization terms (weight decay, KL penalties, contrastive losses) directly to the loss to push the model toward extra properties. Whatever shape the data takes, training is just gradient descent on this function.

Why it matters

The loss function defines what the model actually learns. If the loss doesn’t match the thing you care about, you’ll get a model that scores well on training and falls over in production. Most of the cleverness in modern training — RLHF, DPO, contrastive learning — is really just inventing better loss functions for the same backbone.

Example

import torch.nn.functional as F
logits = model(x)                       # [batch, n_classes]
loss = F.cross_entropy(logits, labels)  # scalar
loss.backward()

You’ll hear it when

  • Picking a loss for a new task (“why MSE and not Huber?”).
  • Reading RLHF or DPO papers.
  • Diagnosing training instability.
  • Setting up a multi-task model with a weighted sum of losses.
  • Discussing why a model overfits to a metric instead of the goal.

Related terms