Related Neural Networks Links
Learn Evaluation Metrics Neural Networks Tutorial, validate concepts with Evaluation Metrics Neural Networks MCQ Questions, and prepare interviews through Evaluation Metrics Neural Networks Interview Questions and Answers.
Evaluation Metrics
Accuracy (correct / total) is easy to interpret but misleading under class imbalance: a 99% negative fraud dataset yields 99% accuracy for a trivial “always negative†classifier. Precision asks: of positive predictions, how many were right? Recall: of actual positives, how many did we catch? F1 is their harmonic mean. ROC-AUC summarizes tradeoffs across thresholds; PR-AUC often suits rare positive classes better.
confusion matrix threshold macro vs micro RMSE
Confusion Matrix & Classification
For binary problems, counts fall into true positive, true negative, false positive, false negative. Precision = TP / (TP + FP); recall = TP / (TP + FN). Choose the metric that reflects business cost: missing fraud (FN) vs annoying users (FP). For multi-class, use macro (average per class, treats classes equally) or micro (pool all decisions—closer to accuracy).
ROC, AUC, and Calibration
The ROC curve plots true positive rate vs false positive rate as you vary the decision threshold. AUC is the area under ROC—ranking quality of scores independent of one threshold. When positives are rare, inspect the precision–recall curve too. Well-calibrated probabilities matter when outputs drive decisions (expected fraction of positives among 0.7-scored examples ≈ 0.7).
Regression
MAE (mean absolute error) is robust to outliers in a linear way. MSE / RMSE penalize large errors more heavily. R² describes variance explained relative to a constant baseline. Use the same scale as your target (e.g. dollars, degrees) when communicating with stakeholders.
Sklearn Example
from sklearn.metrics import classification_report, roc_auc_score
print(classification_report(y_true, y_pred, digits=4))
auc = roc_auc_score(y_true, y_proba)
Summary
- Accuracy alone is insufficient for imbalance; use precision, recall, F1, PR/ROC.
- Pick metrics aligned with error costs and whether you care about ranking or hard labels.
- Regression: MAE vs RMSE vs R² depending on outlier sensitivity.
- Next: PyTorch workflow for building and training nets.
Turn theory into code with PyTorch—modules, tensors, and training loops.