Computer Vision Interview 20 essential Q&A Updated 2026
detection

Object Detection Intro: 20 Essential Q&A

Localize and classify objects—metrics, anchors, and the modern detector landscape.

~11 min read 20 questions Intermediate
IoUmAPNMSsliding window
1 What is object detection? ⚡ easy
Answer: Predict where objects are (bounding boxes or points) and what they are (class labels)—often multiple objects per image.
2 Common bbox formats? 📊 medium
Answer: (x_min,y_min,x_max,y_max), (cx,cy,w,h), or normalized variants—be consistent when converting and computing IoU.
3 Define IoU. 📊 medium
Answer: Intersection area / union area of two boxes (or masks)—0 no overlap, 1 perfect match; used for matching preds to ground truth.
def iou(a, b):
    xa, ya, wa, ha = a; xb, yb, wb, hb = b
    inter = max(0, min(xa+wa, xb+wb)-max(xa,xb)) * max(0, min(ya+ha, yb+hb)-max(ya,yb))
    return inter / (wa*ha + wb*hb - inter + 1e-6)
4 TP vs FP for a detection? 📊 medium
Answer: Match prediction to GT by IoU ≥ threshold: matched = TP; no matching GT = FP; unmatched GT = FN.
5 What is mAP? 🔥 hard
Answer: Mean Average Precision over classes: AP integrates precision–recall curve (often at IoU 0.5 or 0.5:0.95 on COCO).
6 PR curve from detections? 📊 medium
Answer: Sort predictions by confidence; vary threshold to trace precision vs recall—AP is area under interpolated PR curve.
7 Why NMS? 📊 medium
Answer: Many windows fire on same object—suppress lower-scoring boxes with high IoU to same higher-scoring box; variants: soft-NMS, class-aware NMS.
8 What is an anchor? 📊 medium
Answer: Predefined box prior at a feature map location; network predicts offsets + class—speeds convergence vs pure coordinate regression.
9 Two-stage vs one-stage? 📊 medium
Answer: Two-stage: propose regions then classify (R-CNN family). One-stage: dense predictions in one pass (YOLO, SSD, RetinaNet)—usually faster, different error profile.
10 Historical sliding window? ⚡ easy
Answer: Exhaustive windows + classifier—prohibitively slow; modern detectors replace with region proposals or dense anchors on feature maps.
11 Multi-scale detection? 📊 medium
Answer: Image/feature pyramids, multi-scale anchors, or FPN so small and large objects are both seen at appropriate resolution.
12 Why are small objects hard? 📊 medium
Answer: Few pixels on feature maps, weak signal—higher-res inputs, specialized heads, and data augmentation (copy-paste) help.
13 COCO AP@[.5:.95]? 🔥 hard
Answer: Average AP over IoU thresholds 0.50 to 0.95 step 0.05—rewards localization quality, not just 0.5 IoU hits.
14 Role of confidence score? ⚡ easy
Answer: Estimated probability of class (and sometimes objectness)—used to sort preds for PR curve and NMS thresholding.
15 Multi-class boxes? 📊 medium
Answer: Each prediction has C-way softmax (or sigmoid per class for multi-label); match only within same predicted class for mAP.
16 Assigning training targets? 📊 medium
Answer: Match anchors/points to GT by IoU or center rules; positives get box regression targets and class; negatives contribute to objectness / background loss.
17 Many background anchors? 📊 medium
Answer: Extreme imbalance—addressed by sampling (hard negative mining), focal loss, or balanced loss weighting.
18 Latency drivers? ⚡ easy
Answer: Backbone depth, input resolution, number of proposals, NMS cost, batch size—profile end-to-end for deployment.
19 Detection vs segmentation? ⚡ easy
Answer: Boxes are coarse; segmentation gives pixel masks—detection often first stage in two-stage instance segmentation.
20 Per-image vs global AP? 📊 medium
Answer: Standard benchmarks aggregate over dataset; understand whether metric averages over images or pools all detections (COCO pools).

Detection Intro Cheat Sheet

Metrics
  • IoU match
  • mAP
Post
  • NMS
Families
  • Two-stage
  • One-stage

💡 Pro tip: Tie TP/FP to IoU threshold before discussing mAP.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.