Computer Vision Interview
20 essential Q&A
Updated 2026
detection
Object Detection Intro: 20 Essential Q&A
Localize and classify objects—metrics, anchors, and the modern detector landscape.
~11 min read
20 questions
Intermediate
IoUmAPNMSsliding window
Quick Navigation
1
What is object detection?
⚡ easy
Answer: Predict where objects are (bounding boxes or points) and what they are (class labels)—often multiple objects per image.
2
Common bbox formats?
📊 medium
Answer: (x_min,y_min,x_max,y_max), (cx,cy,w,h), or normalized variants—be consistent when converting and computing IoU.
3
Define IoU.
📊 medium
Answer: Intersection area / union area of two boxes (or masks)—0 no overlap, 1 perfect match; used for matching preds to ground truth.
def iou(a, b):
xa, ya, wa, ha = a; xb, yb, wb, hb = b
inter = max(0, min(xa+wa, xb+wb)-max(xa,xb)) * max(0, min(ya+ha, yb+hb)-max(ya,yb))
return inter / (wa*ha + wb*hb - inter + 1e-6)
4
TP vs FP for a detection?
📊 medium
Answer: Match prediction to GT by IoU ≥ threshold: matched = TP; no matching GT = FP; unmatched GT = FN.
5
What is mAP?
🔥 hard
Answer: Mean Average Precision over classes: AP integrates precision–recall curve (often at IoU 0.5 or 0.5:0.95 on COCO).
6
PR curve from detections?
📊 medium
Answer: Sort predictions by confidence; vary threshold to trace precision vs recall—AP is area under interpolated PR curve.
7
Why NMS?
📊 medium
Answer: Many windows fire on same object—suppress lower-scoring boxes with high IoU to same higher-scoring box; variants: soft-NMS, class-aware NMS.
8
What is an anchor?
📊 medium
Answer: Predefined box prior at a feature map location; network predicts offsets + class—speeds convergence vs pure coordinate regression.
9
Two-stage vs one-stage?
📊 medium
Answer: Two-stage: propose regions then classify (R-CNN family). One-stage: dense predictions in one pass (YOLO, SSD, RetinaNet)—usually faster, different error profile.
10
Historical sliding window?
⚡ easy
Answer: Exhaustive windows + classifier—prohibitively slow; modern detectors replace with region proposals or dense anchors on feature maps.
11
Multi-scale detection?
📊 medium
Answer: Image/feature pyramids, multi-scale anchors, or FPN so small and large objects are both seen at appropriate resolution.
12
Why are small objects hard?
📊 medium
Answer: Few pixels on feature maps, weak signal—higher-res inputs, specialized heads, and data augmentation (copy-paste) help.
13
COCO AP@[.5:.95]?
🔥 hard
Answer: Average AP over IoU thresholds 0.50 to 0.95 step 0.05—rewards localization quality, not just 0.5 IoU hits.
14
Role of confidence score?
⚡ easy
Answer: Estimated probability of class (and sometimes objectness)—used to sort preds for PR curve and NMS thresholding.
15
Multi-class boxes?
📊 medium
Answer: Each prediction has C-way softmax (or sigmoid per class for multi-label); match only within same predicted class for mAP.
16
Assigning training targets?
📊 medium
Answer: Match anchors/points to GT by IoU or center rules; positives get box regression targets and class; negatives contribute to objectness / background loss.
17
Many background anchors?
📊 medium
Answer: Extreme imbalance—addressed by sampling (hard negative mining), focal loss, or balanced loss weighting.
18
Latency drivers?
⚡ easy
Answer: Backbone depth, input resolution, number of proposals, NMS cost, batch size—profile end-to-end for deployment.
19
Detection vs segmentation?
⚡ easy
Answer: Boxes are coarse; segmentation gives pixel masks—detection often first stage in two-stage instance segmentation.
20
Per-image vs global AP?
📊 medium
Answer: Standard benchmarks aggregate over dataset; understand whether metric averages over images or pools all detections (COCO pools).
Detection Intro Cheat Sheet
Metrics
- IoU match
- mAP
Post
- NMS
Families
- Two-stage
- One-stage
💡 Pro tip: Tie TP/FP to IoU threshold before discussing mAP.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.