CV MCQ — Chapter 9 0 Questions
One-Stage Object Detection

One-Stage Object Detection MCQ

YOLO, RetinaNet, and SSD-style single-shot detectors for real-time localization.

Easy: 0 Q Medium: 0 Q Hard: 0 Q

YOLO (You Only Look Once) MCQ

YOLO overview

YOLO frames detection as regression from a grid: each cell predicts a small number of boxes and class probabilities. Later versions add better anchors, FPN-like paths, and improved training losses.

Why fast?

One CNN evaluation produces all predictions—amenable to GPU batching and video throughput.

Concepts

Cell responsibility

Training assigns objects to cells (by center); limits duplicates per cell in early YOLO.

Versions

v3/v5/v8 etc. differ in anchor strategies, loss, and architecture—principles remain dense prediction.

Tradeoff

Higher FPS often sacrifices some mAP vs heavy two-stage on hard datasets.

Crowding

Many overlapping same-class objects stress anchor/grid assignment—NMS and architecture matter.

Idea

Image → CNN → tensors of box + class predictions

Pro tip: Read the specific version paper/repo—API and anchor rules change between releases.

RetinaNet MCQ

RetinaNet

One-stage detectors face extreme foreground/background imbalance (many easy negatives). Focal loss modulates CE with a focusing term so training emphasizes hard examples.

Focal loss form

FL = −(1 − p_t)^γ log(p_t) (conceptually)—γ reduces loss for well-classified examples.

Components

Dense anchors

FPN levels P3–P7 cover scales; each location has multiple aspect ratios.

vs CE

Plain CE is overwhelmed by easy negatives; focal reweights dynamically.

Subnets

Classification and box regression heads share FPN features.

Impact

Showed one-stage could approach two-stage accuracy with proper loss.

Pyramid

Backbone → FPN → (cls, reg) at each level

Pro tip: α balances positive/negative contribution alongside γ in full focal loss.