Face & Pose Estimation — Interview Q&A

Question 1

1 Typical pipeline? ⚡ easy

Answer

Answer: Detect face → align to canonical pose → CNN embedding → compare cosine/L2 distance.

Question 2

2 Face detection? 📊 medium

Answer

Answer: Find boxes/scales (MTCNN, RetinaFace, YuNet)—must handle profile, small faces, and clutter before recognition.

Question 3

3 Alignment? 📊 medium

Answer

Answer: Use 5 or more landmarks to similarity-transform face to fixed template—reduces pose/light variance before embedding.

Question 4

4 What is an embedding? 📊 medium

Answer

Answer: L2-normalized vector (e.g. 512-D) such that same identity is close, different identities far—learned with metric objectives.

Question 5

5 Verification vs identification? ⚡ easy

Answer

Answer: Verification: same person or not (1:1). Identification: match probe to gallery (1:N)—needs threshold and rank metrics.

Question 6

6 Open-set identification? 📊 medium

Answer

Answer: Probe may be unknown—need rejection option based on similarity threshold to avoid false accepts.

Question 7

7 Triplet loss? 🔥 hard

Answer

Answer: Anchor closer to positive than to negative by margin—hard negative mining critical for convergence (FaceNet).

Question 8

8 ArcFace? 🔥 hard

Answer

Answer: Angular margin on hypersphere between logits—enforces larger inter-class angular separation; state-of-the-art metric learning.

Question 9

9 FaceNet? 📊 medium

Answer

Answer: End-to-end CNN with triplet loss producing compact embeddings—popularized deep face recognition at scale.

Question 10

10 Benchmarks? ⚡ easy

Answer

Answer: LFW, CFP-FP, IJB-C, MegaFace—vary in pose, N protocol, and difficulty; report TAR@FAR for verification.

Question 11

11 Threshold tuning? 📊 medium

Answer

Answer: Set operating point on validation to balance FAR vs FRR for the deployment constraint (access control vs convenience).

Question 12

12 Anti-spoofing? 📊 medium

Answer

Answer: Detect print/screen/replay attacks with texture, depth, or rPPG—required for liveness in banking kiosks.

Question 13

13 Masks / COVID era? 📊 medium

Answer

Answer: Periocular focus, synthetic mask augmentation, or dedicated training—lower accuracy if model not adapted.

Question 14

14 Demographic bias? 🔥 hard

Answer

Answer: Unequal error rates across groups—audit datasets, balanced training, and fairness constraints in deployment.

Question 15

15 Privacy? ⚡ easy

Answer

Answer: Biometric data is sensitive—encrypt templates, consent, retention limits, on-device processing where possible.

Question 16

16 3D morphable models? 📊 medium

Answer

Answer: Fit 3DMM for pose-invariant recognition or generate synthetic views—helps extreme pose.

Question 17

17 On-device? ⚡ easy

Answer

Answer: Quantized MobileFaceNet-style backbones, NNAPI/CoreML—latency and power constrained.

Question 18

18 Quality assessment? 📊 medium

Answer

Answer: Blur, exposure, resolution gates before embedding—reject low-quality captures to reduce false matches.

Question 19

19 Synthetic faces? 📊 medium

Answer

Answer: GAN-generated diversity for training—watch for domain gap and identity leakage in synthetic sets.

Question 20

20 Presentation attacks? 📊 medium

Answer

Answer: ISO standards categorize attack instruments—multimodal liveness (depth, IR) mitigates many.

Question 21

21 What is pose estimation? ⚡ easy

Answer

Answer: Predict joint locations (shoulders, elbows, etc.) for people in an image/video—2D pixel coords or 3D body config.

Question 22

22 Keypoint formats? 📊 medium

Answer

Answer: xy coordinates, confidence, sometimes visibility flags—datasets define fixed skeleton topology (COCO 17 joints).

Question 23

23 Heatmap regression? 📊 medium

Answer

Answer: Per-joint Gaussian maps; argmax or soft-argmax for coordinate—preserves spatial uncertainty vs direct regression.

Question 24

24 COCO pose? ⚡ easy

Answer

Answer: 17 body keypoints per person—standard for detection+pose benchmarks and pretrained models.

Question 25

25 Top-down approach? 📊 medium

Answer

Answer: Person detector first, then single-person pose inside each ROI—accurate when detector is good, slower with many people.

Question 26

26 Bottom-up? 📊 medium

Answer

Answer: Predict all joints then group into people (OpenPose PAFs, Associative Embedding)—better scaling in crowds.

Question 27

27 OpenPose PAFs? 🔥 hard

Answer

Answer: Part affinity fields encode limb orientation to connect candidate joints—enables real-time multi-person 2D pose.

Question 28

28 HRNet? 🔥 hard

Answer

Answer: Maintains high-resolution streams parallel to low-res with repeated fusions—sharp heatmaps, strong 2D accuracy.

Question 29

29 Loss functions? 📊 medium

Answer

Answer: MSE on heatmaps; or L1 on coords; auxiliary intermediate supervision in hourglass nets aids deep training.

Question 30

30 Occlusion? 📊 medium

Answer

Answer: Low visibility flags, context from torso, temporal smoothing in video—still hard for heavy overlap.

Question 31

31 Multi-person overlap? 📊 medium

Answer

Answer: NMS on detections; association graph solvers; transformer decoders predicting sets of poses (PETR-style ideas).

Question 32

32 3D pose? 🔥 hard

Answer

Answer: Direct regression of camera-space joints or volumetric representations—needs depth, multi-view, or weak 3D supervision.

Question 33

33 Lifting 2D→3D? 📊 medium

Answer

Answer: Use skeleton constraints + camera model or learned prior (VIBE, VideoPose3D) from monocular sequences.

Question 34

34 MediaPipe / BlazePose? 📊 medium

Answer

Answer: Lightweight graphs for mobile AR—33-point topology, real-time on phone GPUs.

Question 35

35 Real-time? ⚡ easy

Answer

Answer: Light backbones, lower input res, single-person mode—30+ FPS on GPU for fitness apps.

Question 36

36 Graph models? 🔥 hard

Answer

Answer: GCN over joints exploits kinematic structure—complements conv heatmap methods especially for 3D.

Question 37

37 OKS mAP? 📊 medium

Answer

Answer: Object keypoint similarity scales error by joint size—COCO pose AP aggregates across OKS thresholds.

Question 38

38 Augmentation? ⚡ easy

Answer

Answer: Random rotation/scale, flip with joint swap, cutout—preserve skeleton validity after transform.

Question 39

39 Mobile deployment? 📊 medium

Answer

Answer: INT8 quant, smaller input, ROI cropping—trade accuracy for thermal/power on edge.

Question 40

40 Limitations? ⚡ easy

Answer

Answer: Rare poses underrepresented, clothing hides joints, single depth ambiguity in monocular 3D—combine sensors or multi-view when possible.

Face & Pose Estimation — Interview Q&A

Face Recognition: 20 Essential Q&A

Pose Estimation: 20 Essential Q&A

Full tutorial chapter