Machine Learning
Model Evaluation
Evaluation metrics, validation strategies, and cross-validation for machine learning models.
Model Evaluation & Validation
Classification Metrics
For classification problems, accuracy alone is often misleading, especially with imbalanced data. We use a family of metrics:
- Accuracy: overall proportion of correct predictions.
- Precision: out of predicted positives, how many are truly positive.
- Recall: out of actual positives, how many we found.
- F1-score: harmonic mean of precision and recall.
- Confusion matrix: full breakdown of TP, FP, TN, FN.
Computing common classification metrics
from sklearn.metrics import (
accuracy_score,
precision_score,
recall_score,
f1_score,
confusion_matrix
)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="binary")
recall = recall_score(y_test, y_pred, average="binary")
f1 = f1_score(y_test, y_pred, average="binary")
cm = confusion_matrix(y_test, y_pred)
Regression Metrics
For regression, we measure how far predictions are from the true numeric values.
- MAE (Mean Absolute Error): average absolute difference.
- MSE (Mean Squared Error): penalizes large errors more.
- RMSE: square root of MSE, same units as the target.
- R²: proportion of variance explained by the model.
Computing regression metrics with scikit-learn
from sklearn.metrics import (
mean_absolute_error,
mean_squared_error,
r2_score
)
import numpy as np
y_pred = reg_model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
Cross-Validation
Instead of a single train/test split, k‑fold cross‑validation repeatedly splits the data to provide a more stable estimate of performance.
K-fold cross-validation in scikit-learn
from sklearn.model_selection import cross_val_score
scores = cross_val_score(
model,
X,
y,
cv=5,
scoring="accuracy"
)
print("CV accuracy:", scores.mean(), "+/-", scores.std())
Tip: always choose metrics that align with your business goal (for example, prioritize recall for medical diagnosis and precision for fraud alerts).