Computer Vision Interview
20 essential Q&A
Updated 2026
instance seg
Instance Segmentation: 20 Essential Q&A
Separate masks per object instance—Mask R-CNN and the overlap problem.
~12 min read
20 questions
Advanced
Mask R-CNNROIAlignmask APFCOS
Quick Navigation
1
What is instance segmentation?
⚡ easy
Answer: Each object instance gets its own binary mask and class label—even two “person” pixels belong to different instances if on different people.
2
Semantic vs instance?
📊 medium
Answer: Semantic: one mask per class. Instance: N masks for N objects, possibly same class—handles overlap with distinct IDs.
3
How does Mask R-CNN extend Faster R-CNN?
📊 medium
Answer: Adds parallel mask head: small FCN on each RoI predicts K×K binary mask per class—multi-task with box + class.
4
Why RoIAlign?
🔥 hard
Answer: RoIPool quantizes coordinates → misalignment for masks. RoIAlign uses bilinear sampling at exact float locations—critical for pixel-accurate masks.
5
Mask branch output?
📊 medium
Answer: Typically 28×28 logits upsampled to RoI size with threshold—lightweight per-region FCN.
6
Loss on masks?
📊 medium
Answer: Per-pixel sigmoid + BCE on the target class mask only (not softmax over all classes per pixel in the classic formulation).
7
Can two instance masks overlap in GT?
⚡ easy
Answer: Yes—foreground object in front of another; model must predict ordering or independent masks per instance.
8
Panoptic segmentation?
📊 medium
Answer: Unifies semantic “stuff” and instance “things” with non-overlapping full-scene labeling—each pixel has one label + optional instance id.
9
What is YOLACT?
📊 medium
Answer: One-stage: combines prototype masks with per-instance coefficients for fast instance segmentation—speed-quality tradeoff.
10
SOLO / SOLOv2 idea?
🔥 hard
Answer: Define instance by grid location and scale—predict category and mask for each grid cell without anchors in the traditional sense.
11
DETR for masks?
🔥 hard
Answer: Set prediction with mask head or panoptic head—queries attend to image features to produce instance masks end-to-end.
12
What is mask AP?
📊 medium
Answer: AP computed on mask IoU instead of box IoU—COCO primary metric for instance segmentation quality.
13
Polygon vs raster?
⚡ easy
Answer: Datasets may store COCO RLE or polygons; training often rasterizes to fixed resolution masks for loss.
14
COCO stuff vs things?
📊 medium
Answer: Things are countable instances; stuff is amorphous (grass, sky)—panoptic benchmark merges both.
15
Small instances?
📊 medium
Answer: High-res FPN levels, copy-paste augmentation, and specialized heads help—same challenges as object detection.
16
Why slower than detection?
⚡ easy
Answer: Extra per-RoI mask computation and higher memory—one-stage mask methods aim to close the gap.
17
Role of FPN?
📊 medium
Answer: Multi-scale object proposals and features so small and large instances both get good mask features.
18
HTC / Cascade?
🔥 hard
Answer: Iteratively refine boxes and masks with cascaded stages and inter-task fusion—state-of-art on COCO era leaderboards.
19
Refine boundaries?
🔥 hard
Answer: Methods like PointRend adaptively sample points on uncertain boundaries for fine mask prediction—better edges.
20
Annotation?
⚡ easy
Answer: Instance masks are most expensive—interactive tools, synthetic data, and weak supervision are active research areas.
Instance Segmentation Cheat Sheet
Key model
- Mask R-CNN
- RoIAlign
Metric
- Mask AP
Fast
- YOLACT
- Query-based
💡 Pro tip: RoIAlign fixes half-pixel misalignment that hurts masks.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.