Computer Vision Interview 20 essential Q&A Updated 2026
Metrics

CV Evaluation Metrics: 20 Essential Q&A

Classification, detection, and segmentation—what to report and common pitfalls.

~12 min read 20 questions Intermediate
IoUmAPROCF1
1 Why care about metrics? ⚡ easy
Answer: They define success criteria, compare models, and expose tradeoffs (precision vs recall)—wrong metric optimizes the wrong behavior.
2 When is accuracy misleading? 📊 medium
Answer: Imbalanced classes—99% negatives makes naive accuracy useless; need per-class and balanced metrics.
3 Define precision and recall. 📊 medium
Answer: Precision = TP/(TP+FP); Recall = TP/(TP+FN)—tension controlled by decision threshold.
4 F1? ⚡ easy
Answer: Harmonic mean of precision and recall—penalizes ignoring either; common single-number summary for binary/multiclass macro-F1.
5 Confusion matrix? 📊 medium
Answer: Counts predictions vs truth for all classes—shows confusion pairs and supports per-class recall.
6 ROC / AUC? 🔥 hard
Answer: TPR vs FPR curve as threshold sweeps; AUC summarizes ranking quality—invariant to prior when comparing rankers.
7 What is IoU? 📊 medium
Answer: Intersection over union of predicted vs ground-truth boxes/masks—range [0,1]; standard match criterion in detection.
iou = inter_area / (area_a + area_b - inter_area)
8 What is mAP in detection? 🔥 hard
Answer: Mean AP across classes—AP is area under precision–recall curve after IoU-thresholded matches; COCO averages multiple IoU thresholds.
9 AP vs mAP? 📊 medium
Answer: AP per class; mAP averages classes—report AP50 vs AP75 to show coarse vs tight localization skill.
10 NMS effect on metrics? 📊 medium
Answer: Suppresses overlapping boxes before evaluation—metric implementation must match competition rules (soft-NMS differs).
11 Segmentation IoU? 📊 medium
Answer: Per-class IoU on pixels; mean IoU (mIoU) across classes—ignore void label per dataset protocol.
12 Dice coefficient? 📊 medium
Answer: 2|A∩B|/(|A|+|B|)—related to F1 on masks; common in medical segmentation with class imbalance.
13 Threshold tuning? 🔥 hard
Answer: Pick operating point on validation to meet product constraint (min recall)—don’t tune on test set.
14 Micro vs macro averaging? 🔥 hard
Answer: Micro pools all examples; macro averages per-class stats—macro highlights rare class performance.
15 OKS in pose? 🔥 hard
Answer: Object keypoint similarity scales error by joint size—COCO pose AP builds on OKS thresholds.
16 Calibration? 📊 medium
Answer: Predicted probabilities match empirical frequencies—ECE, reliability diagrams; miscalibration hurts downstream decisions.
17 Sampling bias? 📊 medium
Answer: Geographic, demographic, or capture bias inflates benchmark scores—report subgroup metrics.
18 Benchmark leakage? ⚡ easy
Answer: Test images in pretraining data or duplicate near-neighbors—contaminates leaderboard comparisons.
19 Human baseline? ⚡ easy
Answer: Annotator agreement sets ceiling—if model beats humans, check task ambiguity or evaluation bugs.
20 What to report? 📊 medium
Answer: Primary metric + confidence intervals or multiple seeds, compute budget, and failure cases—not leaderboard cherry-picking.

Metrics Cheat Sheet

Cls
  • P / R / F1
Det
  • IoU
  • mAP
Seg
  • mIoU / Dice

💡 Pro tip: mAP is not one number—cite AP50/75 and dataset rules.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.