Machine Learning Decision Trees
Interpretable Models

Decision Trees

Decision Trees split the feature space into rectangles and assign simple decision rules, producing models that are powerful yet easy to interpret.

Core Idea

A Decision Tree recursively splits the data based on feature values to create homogeneous groups of the target variable.

  • Internal nodes test conditions on features (\(x_j < t\)).
  • Leaves store predictions (class label or mean value).
  • We choose splits that maximize purity improvement.

Splitting Criteria

For classification trees, scikit‑learn supports:

  • Gini impurity (default): \( G = \sum p_k (1 - p_k) \).
  • Entropy: \( H = - \sum p_k \log_2 p_k \).

Decision Tree Classifier with scikit-learn

Training a Decision Tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

dt = DecisionTreeClassifier(
    max_depth=5,
    criterion="gini",
    random_state=42
)
dt.fit(X_train, y_train)

y_pred = dt.predict(X_test)
print(classification_report(y_test, y_pred))