Machine Learning
Decision Trees
Interpretable Models
Decision Trees
Decision Trees split the feature space into rectangles and assign simple decision rules, producing models that are powerful yet easy to interpret.
Core Idea
A Decision Tree recursively splits the data based on feature values to create homogeneous groups of the target variable.
- Internal nodes test conditions on features (\(x_j < t\)).
- Leaves store predictions (class label or mean value).
- We choose splits that maximize purity improvement.
Splitting Criteria
For classification trees, scikit‑learn supports:
- Gini impurity (default): \( G = \sum p_k (1 - p_k) \).
- Entropy: \( H = - \sum p_k \log_2 p_k \).
Decision Tree Classifier with scikit-learn
Training a Decision Tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
dt = DecisionTreeClassifier(
max_depth=5,
criterion="gini",
random_state=42
)
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
print(classification_report(y_test, y_pred))