Machine Learning

Model Evaluation

Evaluation metrics, validation strategies, and cross-validation for machine learning models.

Model Evaluation & Validation

Classification Metrics

For classification problems, accuracy alone is often misleading, especially with imbalanced data. We use a family of metrics:

  • Accuracy: overall proportion of correct predictions.
  • Precision: out of predicted positives, how many are truly positive.
  • Recall: out of actual positives, how many we found.
  • F1-score: harmonic mean of precision and recall.
  • Confusion matrix: full breakdown of TP, FP, TN, FN.
Computing common classification metrics
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix
)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="binary")
recall = recall_score(y_test, y_pred, average="binary")
f1 = f1_score(y_test, y_pred, average="binary")
cm = confusion_matrix(y_test, y_pred)

Regression Metrics

For regression, we measure how far predictions are from the true numeric values.

  • MAE (Mean Absolute Error): average absolute difference.
  • MSE (Mean Squared Error): penalizes large errors more.
  • RMSE: square root of MSE, same units as the target.
  • R²: proportion of variance explained by the model.
Computing regression metrics with scikit-learn
from sklearn.metrics import (
    mean_absolute_error,
    mean_squared_error,
    r2_score
)
import numpy as np

y_pred = reg_model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

Cross-Validation

Instead of a single train/test split, k‑fold cross‑validation repeatedly splits the data to provide a more stable estimate of performance.

K-fold cross-validation in scikit-learn
from sklearn.model_selection import cross_val_score

scores = cross_val_score(
    model,
    X,
    y,
    cv=5,
    scoring="accuracy"
)

print("CV accuracy:", scores.mean(), "+/-", scores.std())
Tip: always choose metrics that align with your business goal (for example, prioritize recall for medical diagnosis and precision for fraud alerts).