Two-Stage Object Detection MCQ
Detection fundamentals and the R-CNN family—region proposals, Fast R-CNN, and Faster R-CNN.
Object Detection Intro MCQ
Object detection basics
Detectors output class scores and bounding boxes (and sometimes masks or keypoints). Training needs matching predictions to ground truth—IoU is the usual overlap criterion.
mAP
Average precision integrates precision–recall across score thresholds; mAP averages over classes (and IoU thresholds in COCO).
Vocabulary
IoU
Intersection area divided by union of two axis-aligned boxes (for AABB case).
TP / FP
Matched high-IoU prediction to an unmatched GT is TP; overlap wrong class or duplicate is FP.
Sliding window
Classify every location/scale—expensive; modern nets predict boxes densely or from proposals.
Anchor / prior boxes
Template boxes regressed to objects—sizes/aspect ratios cover the dataset prior.
Output
List of (class, score, box) per detected object
R-CNN Family MCQ
R-CNN family
R-CNN ran a CNN on thousands of warped proposal windows—slow. Fast R-CNN runs CNN once, pools features per ROI. Faster R-CNN learns proposals with a Region Proposal Network on feature maps.
ROI Pooling
Quantizes each ROI onto the feature grid into a fixed H×W—enables batching FC heads (RoIAlign refines alignment).
Evolution
R-CNN
Proposals + per-ROI forward—redundant compute.
Fast
Shared backbone; ROI pooling gathers features per proposal.
Faster
RPN classifies objectness + regresses boxes from anchors on the feature map.
Tradeoff
Two-stage: often higher mAP; heavier latency than one-stage YOLO-style.
Faster R-CNN
Backbone → FPN (optional) → RPN → RoI head (cls + reg [+ mask])