Machine Learning
Random Forest
Ensemble
Random Forest
Random Forest is an ensemble of Decision Trees that improves accuracy and robustness by averaging the predictions of many diverse trees.
Intuition
- Train multiple Decision Trees on different bootstrap samples of the data (bagging).
- At each split, consider only a random subset of features (feature randomness).
- Average predictions (regression) or take majority vote (classification).
Random Forest with scikit-learn
RandomForestClassifier example
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
rf = RandomForestClassifier(
n_estimators=200,
max_depth=None,
random_state=42,
n_jobs=-1
)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print(classification_report(y_test, y_pred))
Feature Importance
Random Forests can estimate feature importance by how much each feature reduces impurity across the forest.
importances = rf.feature_importances_
for name, score in sorted(zip(feature_names, importances),
key=lambda x: x[1], reverse=True):
print(f"{name}: {score:.3f}")