Bias
The statistical error introduced when a model is too simple to capture the true structure of the data — not the societal fairness kind.
In one line
Error from the model being too simple — a straight line trying to fit a curve.
What it actually means
In the bias–variance decomposition of generalization error, bias is the part that comes from the model family itself not being flexible enough. High bias means the model systematically underfits — it has strong assumptions that don’t match reality. A linear regression on a clearly non-linear dataset has high bias no matter how much data you throw at it. Variance, the other half, is error from being too sensitive to the particular training set. Bias and variance trade off against each other as you change model capacity.
This is a different concept from “bias” as in fairness (demographic or societal bias in model outputs). When a stats-heavy colleague says “high bias”, they mean underfitting.
Why it matters
Every model tuning decision — capacity, regularization, feature count — is implicitly a move along the bias–variance axis. Naming the thing you’re trading off keeps the conversation honest.
Example
Train error: 0.30 Test error: 0.32 → high bias (both bad, close together)
Train error: 0.01 Test error: 0.30 → high variance (huge gap)
You’ll hear it when
- Diagnosing why a model won’t learn (“high bias — try a bigger model”).
- Discussing the bias–variance tradeoff in an interview.
- Reviewing learning curves.
- Clarifying which “bias” a coworker means.