Model Evaluation & Validation

Learn how to measure the performance of Machine Learning models using the right metrics and robust validation techniques.

Classification Metrics

For classification problems, accuracy alone is often misleading, especially with imbalanced data. We use a family of metrics:

Accuracy: overall proportion of correct predictions.
Precision: out of predicted positives, how many are truly positive.
Recall: out of actual positives, how many we found.
F1-score: harmonic mean of precision and recall.
Confusion matrix: full breakdown of TP, FP, TN, FN.

Computing common classification metrics

from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix
)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="binary")
recall = recall_score(y_test, y_pred, average="binary")
f1 = f1_score(y_test, y_pred, average="binary")
cm = confusion_matrix(y_test, y_pred)

Regression Metrics

For regression, we measure how far predictions are from the true numeric values.

MAE (Mean Absolute Error): average absolute difference.
MSE (Mean Squared Error): penalizes large errors more.
RMSE: square root of MSE, same units as the target.
R²: proportion of variance explained by the model.

Computing regression metrics with scikit-learn

from sklearn.metrics import (
    mean_absolute_error,
    mean_squared_error,
    r2_score
)
import numpy as np

y_pred = reg_model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

Cross-Validation

Instead of a single train/test split, k‑fold cross‑validation repeatedly splits the data to provide a more stable estimate of performance.

K-fold cross-validation in scikit-learn

from sklearn.model_selection import cross_val_score

scores = cross_val_score(
    model,
    X,
    y,
    cv=5,
    scoring="accuracy"
)

print("CV accuracy:", scores.mean(), "+/-", scores.std())

Tip: always choose metrics that align with your business goal (for example, prioritize recall for medical diagnosis and precision for fraud alerts).

Previous: Data Preprocessing Next: Linear Regression