Computer Vision Interview
20 essential Q&A
Updated 2026
semantic seg
Semantic Segmentation: 20 Essential Q&A
Pixel-wise class labels, encoder–decoder designs, and how we score dense prediction.
~12 min read
20 questions
Advanced
FCNU-NetmIoUdice
Quick Navigation
1. What is semantic segmentation?
2. vs image classification
3. Fully convolutional (FCN)
4. U-Net & skip connections
5. Upsampling / decoder
6. mIoU metric
7. Dice / F1 on masks
2. vs image classification
3. Fully convolutional (FCN)
4. U-Net & skip connections
5. Upsampling / decoder
6. mIoU metric
7. Dice / F1 on masks
1
What is semantic segmentation?
⚡ easy
Answer: Assigning a class label to every pixel (road, sky, person)—no distinction between different instances of the same class.
2
How does it differ from classification?
⚡ easy
Answer: Classification: one label per image. Semantic segmentation: dense spatial map of labels—requires localization and context.
3
What did FCN change?
📊 medium
Answer: Replaced fully connected layers with 1×1 convolutions so arbitrary input sizes work; learnable upsampling (deconv/transposed conv) to recover resolution.
4
Why U-Net skips?
📊 medium
Answer: Encoder downsamples for context; decoder upsamples; skip connections fuse fine detail from shallow layers with semantic deep features—sharp boundaries.
5
Common upsampling methods?
📊 medium
Answer: Transposed convolution, bilinear upsample + conv, sub-pixel shuffle—each trades artifacts, parameters, and speed differently.
6
What is mIoU?
📊 medium
Answer: Mean Intersection over Union per class (then averaged): measures overlap of predicted vs ground-truth masks—standard benchmark metric.
7
What is Dice coefficient?
📊 medium
Answer: 2|A∩B|/(|A|+|B|)—closely related to F1 for binary masks; common loss for medical segmentation when foreground is tiny.
8
Standard loss?
⚡ easy
Answer: Per-pixel cross-entropy (softmax over classes); can weight rare classes or use focal variants for hard pixels.
9
Why are boundaries hard?
🔥 hard
Answer: Ambiguous edges, thin structures disappear at low res—fixes: deep supervision, boundary-aware loss, high-res branches, or larger input crops.
10
Handle class imbalance?
📊 medium
Answer: Weighted CE, oversampling rare classes, focal loss, dice loss, or balanced sampling in batches.
11
What is ASPP?
🔥 hard
Answer: Atrous spatial pyramid pooling—parallel dilated convs at multiple rates capture multi-scale context without losing resolution (DeepLab family).
12
What is PSPNet idea?
📊 medium
Answer: Pyramid pooling at several scales then upsample and concatenate—rich global scene context for each pixel.
13
Multi-scale inference?
📊 medium
Answer: Run network on several scales / flipped inputs and average logits—boosts mIoU at inference cost.
14
Weakly supervised segmentation?
🔥 hard
Answer: Train from image tags, scribbles, or bounding boxes using constraints (e.g. MIL, GrabCut-style seeds)—less pixel labels needed.
15
Link to panoptic?
📊 medium
Answer: Panoptic adds instance IDs for “things” while semantic handles “stuff”—semantic is a component of full scene parsing.
16
Use CRF post-processing?
📊 medium
Answer: Historically refined CNN outputs with pairwise smoothness; less dominant now with stronger architectures but still taught in interviews.
17
Can semantic separate two people?
⚡ easy
Answer: No—both get label “person”; need instance segmentation for separate masks.
18
Why is data expensive?
⚡ easy
Answer: Pixel-accurate masks per image vs bounding boxes—tools like semi-auto labeling and synthetic data help.
19
Transformers for segmentation?
🔥 hard
Answer: SegFormer, Mask2Former, Segmenter—global attention and mask queries compete with CNN encoders on benchmarks.
20
Real-time models?
📊 medium
Answer: Lightweight backbones (MobileNet), BiSeNet, Fast-SCNN—trade mIoU for FPS on edge devices.
Semantic Segmentation Cheat Sheet
Architecture
- Encoder–decoder
- Skips (U-Net)
Metric
- mIoU
- Dice (medical)
Context
- ASPP / PSP
- Multi-scale test
💡 Pro tip: Dense per-pixel labels; same class shares one semantic mask.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.