Feature Engineering

Classical ML

The craft of deriving inputs that make a model's job easier — usually by encoding domain knowledge the raw data doesn't surface.


In one line

The craft of deriving inputs that make a model’s job easier — usually by encoding domain knowledge the raw data doesn’t surface.

What it actually means

Feature engineering is everything that happens between raw rows and the matrix you feed to a model: parsing timestamps into hour-of-week, bucketing rare categories, computing rolling averages, joining lookup tables, normalizing numeric ranges, encoding categoricals, and synthesising interactions. In classical ML it’s where most of the win lives — a linear model with thoughtful features will routinely beat a tree on default features. Deep learning shifted some of this onto the model itself, but tabular and time-series workflows still live and die by it.

Why it matters

A great feature can replace a year of model tweaking. It’s also where bugs hide: a feature computed differently in training and serving is the most common cause of “the model worked offline and broke in production”. If you’re working in tabular ML, feature engineering plus a good evaluation loop is most of the job.

Example

df["hour"] = df["ts"].dt.hour
df["is_weekend"] = df["ts"].dt.dayofweek >= 5
df["amount_log"] = np.log1p(df["amount"])
df["amount_per_user_30d"] = df.groupby("user_id")["amount"].rolling("30d").mean().reset_index(level=0, drop=True)

You’ll hear it when

  • Working on any tabular ML problem (fraud, churn, credit, ranking).
  • Building a feature store and arguing about online/offline parity.
  • Investigating training-serving skew.
  • Comparing classical ML to a deep model on a tabular benchmark.
  • Onboarding a new dataset and deciding what to compute upstream.

Related terms