Computer Vision Interview 20 essential Q&A Updated 2026
instance seg

Instance Segmentation: 20 Essential Q&A

Separate masks per object instance—Mask R-CNN and the overlap problem.

~12 min read 20 questions Advanced
Mask R-CNNROIAlignmask APFCOS
1 What is instance segmentation? ⚡ easy
Answer: Each object instance gets its own binary mask and class label—even two “person” pixels belong to different instances if on different people.
2 Semantic vs instance? 📊 medium
Answer: Semantic: one mask per class. Instance: N masks for N objects, possibly same class—handles overlap with distinct IDs.
3 How does Mask R-CNN extend Faster R-CNN? 📊 medium
Answer: Adds parallel mask head: small FCN on each RoI predicts K×K binary mask per class—multi-task with box + class.
4 Why RoIAlign? 🔥 hard
Answer: RoIPool quantizes coordinates → misalignment for masks. RoIAlign uses bilinear sampling at exact float locations—critical for pixel-accurate masks.
5 Mask branch output? 📊 medium
Answer: Typically 28×28 logits upsampled to RoI size with threshold—lightweight per-region FCN.
6 Loss on masks? 📊 medium
Answer: Per-pixel sigmoid + BCE on the target class mask only (not softmax over all classes per pixel in the classic formulation).
7 Can two instance masks overlap in GT? ⚡ easy
Answer: Yes—foreground object in front of another; model must predict ordering or independent masks per instance.
8 Panoptic segmentation? 📊 medium
Answer: Unifies semantic “stuff” and instance “things” with non-overlapping full-scene labeling—each pixel has one label + optional instance id.
9 What is YOLACT? 📊 medium
Answer: One-stage: combines prototype masks with per-instance coefficients for fast instance segmentation—speed-quality tradeoff.
10 SOLO / SOLOv2 idea? 🔥 hard
Answer: Define instance by grid location and scale—predict category and mask for each grid cell without anchors in the traditional sense.
11 DETR for masks? 🔥 hard
Answer: Set prediction with mask head or panoptic head—queries attend to image features to produce instance masks end-to-end.
12 What is mask AP? 📊 medium
Answer: AP computed on mask IoU instead of box IoU—COCO primary metric for instance segmentation quality.
13 Polygon vs raster? ⚡ easy
Answer: Datasets may store COCO RLE or polygons; training often rasterizes to fixed resolution masks for loss.
14 COCO stuff vs things? 📊 medium
Answer: Things are countable instances; stuff is amorphous (grass, sky)—panoptic benchmark merges both.
15 Small instances? 📊 medium
Answer: High-res FPN levels, copy-paste augmentation, and specialized heads help—same challenges as object detection.
16 Why slower than detection? ⚡ easy
Answer: Extra per-RoI mask computation and higher memory—one-stage mask methods aim to close the gap.
17 Role of FPN? 📊 medium
Answer: Multi-scale object proposals and features so small and large instances both get good mask features.
18 HTC / Cascade? 🔥 hard
Answer: Iteratively refine boxes and masks with cascaded stages and inter-task fusion—state-of-art on COCO era leaderboards.
19 Refine boundaries? 🔥 hard
Answer: Methods like PointRend adaptively sample points on uncertain boundaries for fine mask prediction—better edges.
20 Annotation? ⚡ easy
Answer: Instance masks are most expensive—interactive tools, synthetic data, and weak supervision are active research areas.

Instance Segmentation Cheat Sheet

Key model
  • Mask R-CNN
  • RoIAlign
Metric
  • Mask AP
Fast
  • YOLACT
  • Query-based

💡 Pro tip: RoIAlign fixes half-pixel misalignment that hurts masks.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.