Semantic Segmentation MCQ
Dense labeling: each pixel gets a class. Encoder–decoder architectures, multi-scale context, skip connections, and how metrics differ from detection.
Per-pixel
Class map
Encoder
Downsample
Decoder
Upsample
IoU
Overlap
Semantic segmentation
Every pixel is classified (sky, road, person, …) without separating object instances. Fully convolutional networks replaced sliding-window classifiers for dense outputs.
U-Net idea
Skip connections fuse fine spatial detail from the encoder with semantic features from the bottleneck—sharp boundaries.
Building blocks
FCN
Replace FC layers with 1×1 convs; upsample feature maps to input resolution.
ASPP / dilation
Multi-scale context without losing resolution too aggressively.
Loss
Per-pixel cross-entropy; class imbalance may need weighting or focal-style ideas.
vs instance
Semantic does not separate two people as different IDs—instance segmentation adds that.
Output
H×W label map (or H×W×C logits)