Neural Networks 15 Essential Q&A
Interview Prep

Evaluation Metrics — 15 Interview Questions

Accuracy pitfalls, precision/recall trade-offs, ROC vs PR curves, and choosing metrics for imbalanced or multi-class problems.

Colored left borders per card; green / amber / red difficulty chips.

Accuracy F1 AUC Confusion
1 Define accuracy.Easy
Answer: (TP+TN) / total—fraction of correct predictions. Misleading when classes are imbalanced (always predicting majority class).
2 Precision vs recall.Easy
Answer: Precision = TP/(TP+FP)—of predicted positives, how many right. Recall = TP/(TP+FN)—of actual positives, how many found. Usually a trade-off via threshold.
Precision = TP/(TP+FP),  Recall = TP/(TP+FN)
3 F1 score.Easy
Answer: Harmonic mean of precision and recall: 2PR/(P+R)—penalizes models that are strong on only one; common for imbalanced binary tasks.
4 Confusion matrix.Easy
Answer: Rows/columns for true vs predicted classes; read off TP, FP, TN, FN for binary; extends to multi-class with diagonal = correct.
5 ROC curve and AUC.Medium
Answer: Plot TPR vs FPR as threshold varies. AUC = area—ranking quality; 0.5 random, 1 perfect. Useful when you care about discrimination across thresholds.
6 PR curve vs ROC when imbalanced.Medium
Answer: PR curve (precision vs recall) often more informative with rare positives—ROC can look optimistic because FPR is dominated by negatives.
7 Macro vs micro F1 (multi-class).Hard
Answer: Macro: average F1 per class equally—emphasizes rare classes. Micro: aggregate TP/FP/FN globally then compute P/R/F1—dominated by frequent classes.
8 Top-k accuracy (classification).Easy
Answer: Correct if true label in model’s top k predictions—softer than top-1; used in ImageNet-style benchmarks.
9 Regression: MAE vs RMSE.Medium
Answer: MAE = mean |error|—robust, same units as target. RMSE penalizes large errors more—sensitive to outliers.
10 Log loss (cross-entropy) as metric.Medium
Answer: Penalizes confident wrong probabilities—measures calibration + discrimination; better than accuracy when you need probabilistic quality.
11 Brier score (one line).Hard
Answer: Mean squared error between predicted probability and outcome—for binary, measures calibration and sharpness together.
12 mAP in detection (concept).Hard
Answer: Average precision per class over IoU thresholds / recall levels, then mean across classes—standard object-detection summary metric.
13 Choosing classification threshold.Medium
Answer: Tune on validation to maximize F1, meet minimum recall, or align with business cost FP vs FN—not always 0.5.
14 Why report a naive baseline?Easy
Answer: 95% accuracy means nothing if majority class is 94%—majority classifier sets the floor to beat.
15 Which metric for fraud detection (brief)?Medium
Answer: Often optimize recall at fixed precision or PR-AUC—few positives, cost of FN high; accuracy alone misleading.
Tie metric to business cost of FP vs FN.

Quick review checklist

  • Accuracy limits; precision, recall, F1; confusion matrix.
  • ROC-AUC vs PR; macro vs micro; threshold choice.
  • Regression MAE/RMSE; log loss; know your baseline.