Evaluation Metrics â€” 15 Interview Questions

Accuracy pitfalls, precision/recall trade-offs, ROC vs PR curves, and choosing metrics for imbalanced or multi-class problems.

Colored left borders per card; green / amber / red difficulty chips.

Accuracy F1 AUC Confusion

1 Define accuracy.Easy

Answer: (TP+TN) / totalâ€”fraction of correct predictions. Misleading when classes are imbalanced (always predicting majority class).

2 Precision vs recall.Easy

Answer: Precision = TP/(TP+FP)â€”of predicted positives, how many right. Recall = TP/(TP+FN)â€”of actual positives, how many found. Usually a trade-off via threshold.

Precision = TP/(TP+FP), Recall = TP/(TP+FN)

3 F1 score.Easy

Answer: Harmonic mean of precision and recall: 2PR/(P+R)â€”penalizes models that are strong on only one; common for imbalanced binary tasks.

4 Confusion matrix.Easy

Answer: Rows/columns for true vs predicted classes; read off TP, FP, TN, FN for binary; extends to multi-class with diagonal = correct.

5 ROC curve and AUC.Medium

Answer: Plot TPR vs FPR as threshold varies. AUC = areaâ€”ranking quality; 0.5 random, 1 perfect. Useful when you care about discrimination across thresholds.

6 PR curve vs ROC when imbalanced.Medium

Answer: PR curve (precision vs recall) often more informative with rare positivesâ€”ROC can look optimistic because FPR is dominated by negatives.

7 Macro vs micro F1 (multi-class).Hard

Answer: Macro: average F1 per class equallyâ€”emphasizes rare classes. Micro: aggregate TP/FP/FN globally then compute P/R/F1â€”dominated by frequent classes.

8 Top-k accuracy (classification).Easy

Answer: Correct if true label in modelâ€™s top k predictionsâ€”softer than top-1; used in ImageNet-style benchmarks.

9 Regression: MAE vs RMSE.Medium

Answer: MAE = mean |error|â€”robust, same units as target. RMSE penalizes large errors moreâ€”sensitive to outliers.

10 Log loss (cross-entropy) as metric.Medium

Answer: Penalizes confident wrong probabilitiesâ€”measures calibration + discrimination; better than accuracy when you need probabilistic quality.

11 Brier score (one line).Hard

Answer: Mean squared error between predicted probability and outcomeâ€”for binary, measures calibration and sharpness together.

12 mAP in detection (concept).Hard

Answer: Average precision per class over IoU thresholds / recall levels, then mean across classesâ€”standard object-detection summary metric.

13 Choosing classification threshold.Medium

Answer: Tune on validation to maximize F1, meet minimum recall, or align with business cost FP vs FNâ€”not always 0.5.

14 Why report a naive baseline?Easy

Answer: 95% accuracy means nothing if majority class is 94%â€”majority classifier sets the floor to beat.

15 Which metric for fraud detection (brief)?Medium

Answer: Often optimize recall at fixed precision or PR-AUCâ€”few positives, cost of FN high; accuracy alone misleading.

Tie metric to business cost of FP vs FN.

Quick review checklist

Accuracy limits; precision, recall, F1; confusion matrix.
ROC-AUC vs PR; macro vs micro; threshold choice.
Regression MAE/RMSE; log loss; know your baseline.

Previous: Transfer learning Next: PyTorch

Related Neural Networks Links

Evaluation Metrics â€” 15 Interview Questions

Quick review checklist