Machine Learning

Ensemble Methods

Bagging, boosting, gradient boosting, and ensemble learning strategies.

Ensemble Learning

Why Ensembles Work

Individual models make different errors; averaging or voting can cancel out some of this noise.
Ensembles reduce variance (bagging) or bias (boosting), depending on the method.
They are a standard tool in winning solutions to ML competitions.

Bagging (Bootstrap Aggregating)

Bagging trains multiple base learners independently on different bootstrap samples of the training data and then averages their predictions.

Reduces variance of highâ€‘variance models like Decision Trees.
Random Forest is the most popular baggingâ€‘based ensemble.

Boosting

Boosting trains base learners sequentially, where each new model focuses more on the mistakes of the previous ones.

Examples: AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost.
Often achieve stateâ€‘ofâ€‘theâ€‘art results on tabular data.

Stacking

Stacking combines the outputs of diverse base models (trees, linear models, neural nets) using a metaâ€‘learner.

from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

estimators = [
    ("dt", DecisionTreeClassifier(max_depth=5)),
    ("svm", SVC(probability=True))
]

stack = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression()
)

Practical Tips for Ensembles

Start with simple ensembles like Random Forest before trying more complex stacks.
Use crossâ€‘validation to generate outâ€‘ofâ€‘fold predictions when stacking to avoid leakage.
Watch out for training time and memory usage, especially with many large base models.
On tabular data, treeâ€‘based ensembles (Random Forest, Gradient Boosting) are usually the strongest baseline.

Gradient Boosting

Intuition

Start with a simple base prediction (e.g., mean of targets).
Fit a new tree to the residuals (errors) of the current model.
Add this new tree to the ensemble with a learning rate.
Repeat for many iterations to gradually minimize the loss function.

GradientBoostingClassifier with scikit-learn

Basic Gradient Boosting example

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

gb = GradientBoostingClassifier(
    n_estimators=200,
    learning_rate=0.05,
    max_depth=3,
    random_state=42
)
gb.fit(X_train, y_train)

y_pred = gb.predict(X_test)
print(classification_report(y_test, y_pred))

Advanced Gradient Boosting (XGBoost, LightGBM)

Modern gradient boosting libraries add powerful optimizations:

XGBoost: regularization, tree pruning, parallelization.
LightGBM: histogramâ€‘based splits, leafâ€‘wise growth, very fast on large datasets.
CatBoost: strong support for categorical features.

Key Hyperparameters

n_estimators: number of boosting stages (too high â†’ overfitting, too low â†’ underfitting).
learning_rate: how much each tree contributes; lower values often need more trees.
max_depth / max_leaf_nodes: control tree complexity.
subsample: using < 1.0 adds randomness and can improve generalization.

Previous Next