Computer Vision Interview 20 essential Q&A Updated 2026
Pose

Pose Estimation: 20 Essential Q&A

Localize body joints in 2D/3D—heatmaps, associations, and multi-person scenes.

~11 min read 20 questions Advanced
COCOheatmapPAFHRNet
1 What is pose estimation? ⚡ easy
Answer: Predict joint locations (shoulders, elbows, etc.) for people in an image/video—2D pixel coords or 3D body config.
2 Keypoint formats? 📊 medium
Answer: xy coordinates, confidence, sometimes visibility flags—datasets define fixed skeleton topology (COCO 17 joints).
3 Heatmap regression? 📊 medium
Answer: Per-joint Gaussian maps; argmax or soft-argmax for coordinate—preserves spatial uncertainty vs direct regression.
# heatmap argmax → (x,y) joint; soft-argmax differentiable
4 COCO pose? ⚡ easy
Answer: 17 body keypoints per person—standard for detection+pose benchmarks and pretrained models.
5 Top-down approach? 📊 medium
Answer: Person detector first, then single-person pose inside each ROI—accurate when detector is good, slower with many people.
6 Bottom-up? 📊 medium
Answer: Predict all joints then group into people (OpenPose PAFs, Associative Embedding)—better scaling in crowds.
7 OpenPose PAFs? 🔥 hard
Answer: Part affinity fields encode limb orientation to connect candidate joints—enables real-time multi-person 2D pose.
8 HRNet? 🔥 hard
Answer: Maintains high-resolution streams parallel to low-res with repeated fusions—sharp heatmaps, strong 2D accuracy.
9 Loss functions? 📊 medium
Answer: MSE on heatmaps; or L1 on coords; auxiliary intermediate supervision in hourglass nets aids deep training.
10 Occlusion? 📊 medium
Answer: Low visibility flags, context from torso, temporal smoothing in video—still hard for heavy overlap.
11 Multi-person overlap? 📊 medium
Answer: NMS on detections; association graph solvers; transformer decoders predicting sets of poses (PETR-style ideas).
12 3D pose? 🔥 hard
Answer: Direct regression of camera-space joints or volumetric representations—needs depth, multi-view, or weak 3D supervision.
13 Lifting 2D→3D? 📊 medium
Answer: Use skeleton constraints + camera model or learned prior (VIBE, VideoPose3D) from monocular sequences.
14 MediaPipe / BlazePose? 📊 medium
Answer: Lightweight graphs for mobile AR—33-point topology, real-time on phone GPUs.
15 Real-time? ⚡ easy
Answer: Light backbones, lower input res, single-person mode—30+ FPS on GPU for fitness apps.
16 Graph models? 🔥 hard
Answer: GCN over joints exploits kinematic structure—complements conv heatmap methods especially for 3D.
17 OKS mAP? 📊 medium
Answer: Object keypoint similarity scales error by joint size—COCO pose AP aggregates across OKS thresholds.
18 Augmentation? ⚡ easy
Answer: Random rotation/scale, flip with joint swap, cutout—preserve skeleton validity after transform.
19 Mobile deployment? 📊 medium
Answer: INT8 quant, smaller input, ROI cropping—trade accuracy for thermal/power on edge.
20 Limitations? ⚡ easy
Answer: Rare poses underrepresented, clothing hides joints, single depth ambiguity in monocular 3D—combine sensors or multi-view when possible.

Pose Cheat Sheet

2D
  • Heatmaps
  • HRNet
Multi
  • Top-down
  • Bottom-up
3D
  • Lift / multi-view

💡 Pro tip: Heatmaps vs regression; top-down vs PAF grouping.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.