One-Stage Object Detection MCQ
YOLO, RetinaNet, and SSD-style single-shot detectors for real-time localization.
YOLO (You Only Look Once) MCQ
YOLO overview
YOLO frames detection as regression from a grid: each cell predicts a small number of boxes and class probabilities. Later versions add better anchors, FPN-like paths, and improved training losses.
Why fast?
One CNN evaluation produces all predictions—amenable to GPU batching and video throughput.
Concepts
Cell responsibility
Training assigns objects to cells (by center); limits duplicates per cell in early YOLO.
Versions
v3/v5/v8 etc. differ in anchor strategies, loss, and architecture—principles remain dense prediction.
Tradeoff
Higher FPS often sacrifices some mAP vs heavy two-stage on hard datasets.
Crowding
Many overlapping same-class objects stress anchor/grid assignment—NMS and architecture matter.
Idea
Image → CNN → tensors of box + class predictions
RetinaNet MCQ
RetinaNet
One-stage detectors face extreme foreground/background imbalance (many easy negatives). Focal loss modulates CE with a focusing term so training emphasizes hard examples.
Focal loss form
FL = −(1 − p_t)^γ log(p_t) (conceptually)—γ reduces loss for well-classified examples.
Components
Dense anchors
FPN levels P3–P7 cover scales; each location has multiple aspect ratios.
vs CE
Plain CE is overwhelmed by easy negatives; focal reweights dynamically.
Subnets
Classification and box regression heads share FPN features.
Impact
Showed one-stage could approach two-stage accuracy with proper loss.
Pyramid
Backbone → FPN → (cls, reg) at each level