XGBoost
XGBA gradient-boosted decision tree library known for winning tabular-data competitions and still being the right default for structured data.
In one line
An efficient, regularized gradient-boosted tree ensemble — the strong default for tabular data.
What it actually means
XGBoost builds an ensemble of shallow decision trees one at a time, each tree trained to correct the residual errors of the sum of previous trees. The innovation over classic GBDT is a regularized loss (penalizing tree complexity explicitly), second-order gradients, sparsity-aware splits, column subsampling, and a very fast histogram-based split finder. The result is a library that’s faster, more accurate, and more robust to overfitting than older boosting implementations. LightGBM and CatBoost are close cousins that dominate the same niche.
Why it matters
For tabular problems — churn, credit, fraud, clickthrough, conversion — XGBoost is still the honest baseline to beat, and neural networks usually lose. It’s fast to train, handles missing values, works well with minimal feature engineering, and the feature_importances_ output gives you real debugging signal. If your data fits in RAM and it’s in rows-and-columns shape, start here before reaching for anything fancier.
Example
from xgboost import XGBClassifier
clf = XGBClassifier(
n_estimators=500, max_depth=6, learning_rate=0.05,
subsample=0.8, colsample_bytree=0.8,
eval_metric="logloss",
)
clf.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)
You’ll hear it when
- Building anything on tabular data.
- Competing on Kaggle or similar.
- Benchmarking neural nets against “just XGBoost”.
- Discussing SHAP values and model explainability.