Object Tracking — Interview Q&A

Question 1

1 What is object tracking? ⚡ easy

Answer

Answer: Estimating object state over time in video—position, size, sometimes 3D pose—while preserving identity across frames.

Question 2

2 SOT vs MOT? 📊 medium

Answer

Answer: Single-object tracking: one target given init box. Multi-object tracking: many objects with IDs—requires association across detections.

Question 3

3 Tracking-by-detection? 📊 medium

Answer

Answer: Run detector each frame, link boxes into trajectories via association—modular and strong with good detectors.

Question 4

4 What is data association? 🔥 hard

Answer

Answer: Decide which detection belongs to which track—classic bipartite matching with cost matrix (IoU, appearance, motion).

Question 5

5 IoU matching? ⚡ easy

Answer

Answer: Greedy or Hungarian match predicted boxes to tracks by highest IoU above threshold—simple baseline for MOT.

Question 6

6 Hungarian algorithm? 📊 medium

Answer

Answer: Solves assignment problem in O(n³)—global optimum for linear cost—used in SORT / DeepSORT association.

Question 7

7 What are ID switches? 📊 medium

Answer

Answer: Tracker swaps identities between objects—common near crossings; penalized in MOTA metric.

Question 8

8 Handle occlusion? 📊 medium

Answer

Answer: Predict motion during missing detections (Kalman), re-identify with appearance when visible again, or joint optimization over windows.

Question 9

9 What is drift in SOT? ⚡ easy

Answer

Answer: Small errors accumulate updating from own predictions—mitigated by periodic re-detection or robust loss.

Question 10

10 Why Kalman filters? 📊 medium

Answer

Answer: Predict box between frames with constant-velocity model; update when measurements arrive—cheap smooth motion prior.

Question 11

11 Role of Re-ID features? 📊 medium

Answer

Answer: Cosine distance on embedding reduces ID switches when IoU ambiguous (similar DeepSORT).

Question 12

12 What is MOTA? 🔥 hard

Answer

Answer: Multiple Object Tracking Accuracy—combines false positives, misses, and ID switches vs ground truth trajectories.

Question 13

13 Online tracking? ⚡ easy

Answer

Answer: Uses only past and current frames—needed for robotics/live video; batch methods use future frames (smoother but not causal).

Question 14

14 Classical KLT? 📊 medium

Answer

Answer: Track corner features with local flow—fast but fragile to appearance change; less common alone for generic objects now.

Question 15

15 Siamese trackers? 📊 medium

Answer

Answer: Template branch + search region CNN—fast SOT without online fine-tuning in early versions (SiamFC family).

Question 16

16 Transformer MOT? 🔥 hard

Answer

Answer: Track queries attend across space-time (e.g. TrackFormer)—joint detection+association in one model trend.

Question 17

17 Real-time MOT? 📊 medium

Answer

Answer: Light detector + simple association (SORT) or specialized accelerators—appearance models add compute.

Question 18

18 BEV tracking? 🔥 hard

Answer

Answer: Track in bird’s-eye view from multi-camera or LiDAR—used in autonomous driving stacks.

Question 19

19 3D MOT? 📊 medium

Answer

Answer: Associate 3D boxes or point clusters—IoU in 3D or GIoU variants; Kalman in xyz + yaw.

Question 20

20 Common benchmarks? ⚡ easy

Answer

Answer: MOTChallenge, KITTI tracking, nuScenes tracking—each defines detection input protocol and metrics.

Question 21

21 What is the Kalman filter? 📊 medium

Answer

Answer: Optimal recursive estimator for linear systems with Gaussian noise—alternates prediction from dynamics and correction from noisy observations.

Question 22

22 State-space form? 🔥 hard

Answer

Answer: x_{k+1} = F x_k + w_k (process noise), z_k = H x_k + v_k (measurement noise)—Kalman assumes linear F,H and Gaussian w,v.

Question 23

23 Typical bbox state in SORT? 📊 medium

Answer

Answer: Often [cx, cy, s, r, vx, vy, vs] (center, scale area-ish, aspect, velocities)—measurements update subset.

Question 24

24 Predict step? 📊 medium

Answer

Answer: x̂− = F x̂, P− = F P Fᵀ + Q—propagate mean and covariance forward in time without new measurement.

Question 25

25 Update step? 📊 medium

Answer

Answer: Fuse measurement z using Kalman gain K: x̂ = x̂− + K(z − H x̂−), P = (I − K H) P−—reduce uncertainty along observed dimensions.

Question 26

26 Kalman gain meaning? 🔥 hard

Answer

Answer: K balances trust in prediction vs measurement based on covariances—if R small (accurate sensor), K larger, trust measurement more.

Question 27

27 Tune Q? 📊 medium

Answer

Answer: Process noise covariance—higher Q = more model uncertainty, tracker follows measurements faster but noisier.

Question 28

28 Tune R? 📊 medium

Answer

Answer: Measurement noise—higher R = smoother track, lag on maneuvers; lower R = jittery if detector noisy.

Question 29

29 Constant velocity model? ⚡ easy

Answer

Answer: Assumes derivative of position constant between frames—simple, works for smooth motion; fails on sharp turns.

Question 30

30 Constant acceleration? 📊 medium

Answer

Answer: Adds acceleration state for more expressive motion—better for maneuvering targets, more parameters to tune.

Question 31

31 When EKF? 🔥 hard

Answer

Answer: Nonlinear dynamics or measurement—linearize with Jacobians around current estimate; no longer globally optimal but widely used.

Question 32

32 UKF / particle? 🔥 hard

Answer

Answer: Handle stronger nonlinearities—UKF uses sigma points; particle filters for non-Gaussian multimodal posteriors (rare in simple MOT).

Question 33

33 Missing detection? ⚡ easy

Answer

Answer: Skip update; covariance grows with prediction-only steps until next match—standard in SORT when object temporarily not detected.

Question 34

34 Multi-dimensional measurements? 📊 medium

Answer

Answer: H maps state to observed variables (e.g. only position observed, not velocity directly inferred from motion over time).

Question 35

35 What is P? 📊 medium

Answer

Answer: State estimate covariance—uncertainty ellipsoid; should shrink after informative updates.

Question 36

36 Initialize velocity? ⚡ easy

Answer

Answer: From finite differences of first two boxes or zero velocity with high initial P—tradeoff between fast lock vs overshoot.

Question 37

37 SORT’s use? 📊 medium

Answer

Answer: Each track maintains Kalman state; Hungarian matches detections to predicted boxes—simple, fast MOT baseline.

Question 38

38 OpenCV? ⚡ easy

Answer

Answer: cv2.KalmanFilter with transition/measurement matrices—set dt, Q, R for bbox tracking experiments.

Question 39

39 Numerical issues? 🔥 hard

Answer

Answer: Use Joseph form for P update, symmetric enforcement, or square-root filtering if covariance becomes indefinite.

Question 40

40 When Kalman fails? 📊 medium

Answer

Answer: Highly nonlinear motion, multi-modal uncertainty (occlusions), or heavy-tailed detector noise—consider particle, IMM, or learning-based motion.

Question 41

41 What is SORT? 📊 medium

Answer

Answer: Simple online MOT: Kalman filter motion model + Hungarian assignment with IoU cost between predicted and detected boxes—very fast.

Question 42

42 Steps each frame? 📊 medium

Answer

Answer: Predict all tracks → match detections to tracks by IoU → update matched with Kalman measurement → create new tracks for unmatched dets → delete stale tracks.

Question 43

43 Why Hungarian? ⚡ easy

Answer

Answer: Optimal one-to-one assignment minimizing total cost—better than greedy max-IoU for competing hypotheses.

Question 44

44 Cost matrix? 📊 medium

Answer

Answer: Often 1 − IoU or negative IoU with threshold—reject matches below IoU min (no assignment).

Question 45

45 max_age / min_hits? 📊 medium

Answer

Answer: Delete track if unmatched for max_age frames; confirm birth only after min_hits to reduce spurious tracks from false positives.

Question 46

46 What does DeepSORT add? 🔥 hard

Answer

Answer: CNN appearance embedding + cosine distance combined with motion Mahalanobis gate—reduces ID switches when IoU ambiguous.

Question 47

47 Cosine metric learning? 📊 medium

Answer

Answer: Train embedding so same-ID images are closer than different-ID—used with triplet or classification losses on person crops.

Question 48

48 Cascade matching in DeepSORT? 🔥 hard

Answer

Answer: First match high-confidence detections to tracks using appearance+motion; then lower-confidence in second stage—reduces clutter confusion.

Question 49

49 Mahalanobis gate? 📊 medium

Answer

Answer: Reject association if innovation (z − Hx) is unlikely under predicted covariance—filters physically impossible jumps.

Question 50

50 Descriptor dimension? ⚡ easy

Answer

Answer: Typical 128-D L2-normalized vector per detection crop—cosine distance = 1 − dot product.

Question 51

51 Gallery of features? 📊 medium

Answer

Answer: Store recent embeddings per track for matching—manage length to balance memory and adaptability to appearance change.

Question 52

52 Occlusion? 📊 medium

Answer

Answer: IoU fails when overlapping—appearance helps reacquire correct ID after split; still hard in dense crowds.

Question 53

53 Why fast? ⚡ easy

Answer

Answer: Minimal overhead beyond detector—no heavy joint optimization per frame unlike some MHT approaches.

Question 54

54 What is ByteTrack? 📊 medium

Answer

Answer: Also associates low-score detections in a second pass—recovers occluded objects SORT might drop.

Question 55

55 BoT-SORT? 🔥 hard

Answer

Answer: Adds camera motion compensation + improved Re-ID—strong MOTChallenge scores.

Question 56

56 Dense crowds? 📊 medium

Answer

Answer: IoU-only methods degrade—appearance, higher-order models, or transformer MOT help.

Question 57

57 Train appearance? 📊 medium

Answer

Answer: On person re-ID datasets (Market1501, etc.) separate from detector—domain gap to target scene matters.

Question 58

58 SORT limits? ⚡ easy

Answer

Answer: Assumes good detector; IoU association weak under fast motion / low FPS; camera motion not modeled in vanilla SORT.

Question 59

59 vs joint detectors? 🔥 hard

Answer

Answer: TrackFormer / MOTR predict tracks end-to-end—no hand-crafted association but need more data and compute.

Question 60

60 Production? 📊 medium

Answer

Answer: Match detector FPS; batch Re-ID CNN; tune thresholds per scene; log ID switches for QA.

Object Tracking — Interview Q&A

Object Tracking Basics: 20 Essential Q&A

Kalman Filter for Tracking: 20 Essential Q&A

SORT & DeepSORT: 20 Essential Q&A

Full tutorial chapter