Object Detection Intro: 20 Essential Q&A

Question 1

1 What is object detection? ⚡ easy

Answer

Answer: Predict where objects are (bounding boxes or points) and what they are (class labels)—often multiple objects per image.

Question 2

2 Common bbox formats? 📊 medium

Answer

Answer: (x_min,y_min,x_max,y_max), (cx,cy,w,h), or normalized variants—be consistent when converting and computing IoU.

Question 3

3 Define IoU. 📊 medium

Answer

Answer: Intersection area / union area of two boxes (or masks)—0 no overlap, 1 perfect match; used for matching preds to ground truth.

Question 4

4 TP vs FP for a detection? 📊 medium

Answer

Answer: Match prediction to GT by IoU ≥ threshold: matched = TP; no matching GT = FP; unmatched GT = FN.

Question 5

5 What is mAP? 🔥 hard

Answer

Answer: Mean Average Precision over classes: AP integrates precision–recall curve (often at IoU 0.5 or 0.5:0.95 on COCO).

Question 6

6 PR curve from detections? 📊 medium

Answer

Answer: Sort predictions by confidence; vary threshold to trace precision vs recall—AP is area under interpolated PR curve.

Question 7

7 Why NMS? 📊 medium

Answer

Answer: Many windows fire on same object—suppress lower-scoring boxes with high IoU to same higher-scoring box; variants: soft-NMS, class-aware NMS.

Question 8

8 What is an anchor? 📊 medium

Answer

Answer: Predefined box prior at a feature map location; network predicts offsets + class—speeds convergence vs pure coordinate regression.

Question 9

9 Two-stage vs one-stage? 📊 medium

Answer

Answer: Two-stage: propose regions then classify (R-CNN family). One-stage: dense predictions in one pass (YOLO, SSD, RetinaNet)—usually faster, different error profile.

Question 10

10 Historical sliding window? ⚡ easy

Answer

Answer: Exhaustive windows + classifier—prohibitively slow; modern detectors replace with region proposals or dense anchors on feature maps.

Question 11

11 Multi-scale detection? 📊 medium

Answer

Answer: Image/feature pyramids, multi-scale anchors, or FPN so small and large objects are both seen at appropriate resolution.

Question 12

12 Why are small objects hard? 📊 medium

Answer

Answer: Few pixels on feature maps, weak signal—higher-res inputs, specialized heads, and data augmentation (copy-paste) help.

Question 13

13 COCO AP@[.5:.95]? 🔥 hard

Answer

Answer: Average AP over IoU thresholds 0.50 to 0.95 step 0.05—rewards localization quality, not just 0.5 IoU hits.

Question 14

14 Role of confidence score? ⚡ easy

Answer

Answer: Estimated probability of class (and sometimes objectness)—used to sort preds for PR curve and NMS thresholding.

Question 15

15 Multi-class boxes? 📊 medium

Answer

Answer: Each prediction has C-way softmax (or sigmoid per class for multi-label); match only within same predicted class for mAP.

Question 16

16 Assigning training targets? 📊 medium

Answer

Answer: Match anchors/points to GT by IoU or center rules; positives get box regression targets and class; negatives contribute to objectness / background loss.

Question 17

17 Many background anchors? 📊 medium

Answer

Answer: Extreme imbalance—addressed by sampling (hard negative mining), focal loss, or balanced loss weighting.

Question 18

18 Latency drivers? ⚡ easy

Answer

Answer: Backbone depth, input resolution, number of proposals, NMS cost, batch size—profile end-to-end for deployment.

Question 19

19 Detection vs segmentation? ⚡ easy

Answer

Answer: Boxes are coarse; segmentation gives pixel masks—detection often first stage in two-stage instance segmentation.

Question 20

20 Per-image vs global AP? 📊 medium

Answer

Answer: Standard benchmarks aggregate over dataset; understand whether metric averages over images or pools all detections (COCO pools).

Object Detection Intro: 20 Essential Q&A

Quick Navigation

Detection Intro Cheat Sheet

Metrics

Post

Families

Full tutorial track