Random Forest

Random Forest is an ensemble of Decision Trees that improves accuracy and robustness by averaging the predictions of many diverse trees.

Intuition

Train multiple Decision Trees on different bootstrap samples of the data (bagging).
At each split, consider only a random subset of features (feature randomness).
Average predictions (regression) or take majority vote (classification).

Random Forest with scikit-learn

RandomForestClassifier example

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=None,
    random_state=42,
    n_jobs=-1
)
rf.fit(X_train, y_train)

y_pred = rf.predict(X_test)
print(classification_report(y_test, y_pred))

Feature Importance

Random Forests can estimate feature importance by how much each feature reduces impurity across the forest.

importances = rf.feature_importances_
for name, score in sorted(zip(feature_names, importances),
                          key=lambda x: x[1], reverse=True):
    print(f"{name}: {score:.3f}")

Previous: Decision Trees Next: Support Vector Machines