Machine Learning

Model Evaluation

Evaluation metrics, validation strategies, and cross-validation for machine learning models.

Model Evaluation & Validation

Classification Metrics

For classification problems, accuracy alone is often misleading, especially with imbalanced data. We use a family of metrics:

Accuracy: overall proportion of correct predictions.
Precision: out of predicted positives, how many are truly positive.
Recall: out of actual positives, how many we found.
F1-score: harmonic mean of precision and recall.
Confusion matrix: full breakdown of TP, FP, TN, FN.

Computing common classification metrics

from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix
)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="binary")
recall = recall_score(y_test, y_pred, average="binary")
f1 = f1_score(y_test, y_pred, average="binary")
cm = confusion_matrix(y_test, y_pred)

Regression Metrics

For regression, we measure how far predictions are from the true numeric values.

MAE (Mean Absolute Error): average absolute difference.
MSE (Mean Squared Error): penalizes large errors more.
RMSE: square root of MSE, same units as the target.
RÂ²: proportion of variance explained by the model.

Computing regression metrics with scikit-learn

from sklearn.metrics import (
    mean_absolute_error,
    mean_squared_error,
    r2_score
)
import numpy as np

y_pred = reg_model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

Cross-Validation

Instead of a single train/test split, kâ€‘fold crossâ€‘validation repeatedly splits the data to provide a more stable estimate of performance.

K-fold cross-validation in scikit-learn

from sklearn.model_selection import cross_val_score

scores = cross_val_score(
    model,
    X,
    y,
    cv=5,
    scoring="accuracy"
)

print("CV accuracy:", scores.mean(), "+/-", scores.std())

Tip: always choose metrics that align with your business goal (for example, prioritize recall for medical diagnosis and precision for fraud alerts).

Previous Next