Machine Learning Random Forest
Ensemble

Random Forest

Random Forest is an ensemble of Decision Trees that improves accuracy and robustness by averaging the predictions of many diverse trees.

Intuition

  • Train multiple Decision Trees on different bootstrap samples of the data (bagging).
  • At each split, consider only a random subset of features (feature randomness).
  • Average predictions (regression) or take majority vote (classification).

Random Forest with scikit-learn

RandomForestClassifier example
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=None,
    random_state=42,
    n_jobs=-1
)
rf.fit(X_train, y_train)

y_pred = rf.predict(X_test)
print(classification_report(y_test, y_pred))

Feature Importance

Random Forests can estimate feature importance by how much each feature reduces impurity across the forest.

importances = rf.feature_importances_
for name, score in sorted(zip(feature_names, importances),
                          key=lambda x: x[1], reverse=True):
    print(f"{name}: {score:.3f}")