Feature Engineering
The craft of deriving inputs that make a model's job easier — usually by encoding domain knowledge the raw data doesn't surface.
In one line
The craft of deriving inputs that make a model’s job easier — usually by encoding domain knowledge the raw data doesn’t surface.
What it actually means
Feature engineering is everything that happens between raw rows and the matrix you feed to a model: parsing timestamps into hour-of-week, bucketing rare categories, computing rolling averages, joining lookup tables, normalizing numeric ranges, encoding categoricals, and synthesising interactions. In classical ML it’s where most of the win lives — a linear model with thoughtful features will routinely beat a tree on default features. Deep learning shifted some of this onto the model itself, but tabular and time-series workflows still live and die by it.
Why it matters
A great feature can replace a year of model tweaking. It’s also where bugs hide: a feature computed differently in training and serving is the most common cause of “the model worked offline and broke in production”. If you’re working in tabular ML, feature engineering plus a good evaluation loop is most of the job.
Example
df["hour"] = df["ts"].dt.hour
df["is_weekend"] = df["ts"].dt.dayofweek >= 5
df["amount_log"] = np.log1p(df["amount"])
df["amount_per_user_30d"] = df.groupby("user_id")["amount"].rolling("30d").mean().reset_index(level=0, drop=True)
You’ll hear it when
- Working on any tabular ML problem (fraud, churn, credit, ranking).
- Building a feature store and arguing about online/offline parity.
- Investigating training-serving skew.
- Comparing classical ML to a deep model on a tabular benchmark.
- Onboarding a new dataset and deciding what to compute upstream.