Decision Trees

Decision Trees split the feature space into rectangles and assign simple decision rules, producing models that are powerful yet easy to interpret.

Core Idea

A Decision Tree recursively splits the data based on feature values to create homogeneous groups of the target variable.

Internal nodes test conditions on features (\(x_j < t\)).
Leaves store predictions (class label or mean value).
We choose splits that maximize purity improvement.

Splitting Criteria

For classification trees, scikit‑learn supports:

Gini impurity (default): \( G = \sum p_k (1 - p_k) \).
Entropy: \( H = - \sum p_k \log_2 p_k \).

Decision Tree Classifier with scikit-learn

Training a Decision Tree

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

dt = DecisionTreeClassifier(
    max_depth=5,
    criterion="gini",
    random_state=42
)
dt.fit(X_train, y_train)

y_pred = dt.predict(X_test)
print(classification_report(y_test, y_pred))

Previous: Logistic Regression Next: Random Forest