Related Computer Vision Links
Learn Eval Computer Vision Tutorial, validate concepts with Eval Computer Vision MCQ Questions, and prepare interviews through Eval Computer Vision Interview Questions and Answers.
Computer Vision Interview
20 essential Q&A
Updated 2026
Metrics
CV Evaluation Metrics: 20 Essential Q&A
Classification, detection, and segmentation—what to report and common pitfalls.
~12 min read
20 questions
Intermediate
IoUmAPROCF1
Quick Navigation
1
Why care about metrics?
⚡ easy
Answer: They define success criteria, compare models, and expose tradeoffs (precision vs recall)—wrong metric optimizes the wrong behavior.
2
When is accuracy misleading?
📊 medium
Answer: Imbalanced classes—99% negatives makes naive accuracy useless; need per-class and balanced metrics.
3
Define precision and recall.
📊 medium
Answer: Precision = TP/(TP+FP); Recall = TP/(TP+FN)—tension controlled by decision threshold.
4
F1?
⚡ easy
Answer: Harmonic mean of precision and recall—penalizes ignoring either; common single-number summary for binary/multiclass macro-F1.
5
Confusion matrix?
📊 medium
Answer: Counts predictions vs truth for all classes—shows confusion pairs and supports per-class recall.
6
ROC / AUC?
🔥 hard
Answer: TPR vs FPR curve as threshold sweeps; AUC summarizes ranking quality—invariant to prior when comparing rankers.
7
What is IoU?
📊 medium
Answer: Intersection over union of predicted vs ground-truth boxes/masks—range [0,1]; standard match criterion in detection.
iou = inter_area / (area_a + area_b - inter_area)
8
What is mAP in detection?
🔥 hard
Answer: Mean AP across classes—AP is area under precision–recall curve after IoU-thresholded matches; COCO averages multiple IoU thresholds.
9
AP vs mAP?
📊 medium
Answer: AP per class; mAP averages classes—report AP50 vs AP75 to show coarse vs tight localization skill.
10
NMS effect on metrics?
📊 medium
Answer: Suppresses overlapping boxes before evaluation—metric implementation must match competition rules (soft-NMS differs).
11
Segmentation IoU?
📊 medium
Answer: Per-class IoU on pixels; mean IoU (mIoU) across classes—ignore void label per dataset protocol.
12
Dice coefficient?
📊 medium
Answer: 2|A∩B|/(|A|+|B|)—related to F1 on masks; common in medical segmentation with class imbalance.
13
Threshold tuning?
🔥 hard
Answer: Pick operating point on validation to meet product constraint (min recall)—don’t tune on test set.
14
Micro vs macro averaging?
🔥 hard
Answer: Micro pools all examples; macro averages per-class stats—macro highlights rare class performance.
15
OKS in pose?
🔥 hard
Answer: Object keypoint similarity scales error by joint size—COCO pose AP builds on OKS thresholds.
16
Calibration?
📊 medium
Answer: Predicted probabilities match empirical frequencies—ECE, reliability diagrams; miscalibration hurts downstream decisions.
17
Sampling bias?
📊 medium
Answer: Geographic, demographic, or capture bias inflates benchmark scores—report subgroup metrics.
18
Benchmark leakage?
⚡ easy
Answer: Test images in pretraining data or duplicate near-neighbors—contaminates leaderboard comparisons.
19
Human baseline?
⚡ easy
Answer: Annotator agreement sets ceiling—if model beats humans, check task ambiguity or evaluation bugs.
20
What to report?
📊 medium
Answer: Primary metric + confidence intervals or multiple seeds, compute budget, and failure cases—not leaderboard cherry-picking.
Metrics Cheat Sheet
Cls
- P / R / F1
Det
- IoU
- mAP
Seg
- mIoU / Dice
💡 Pro tip: mAP is not one number—cite AP50/75 and dataset rules.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.