Computer Vision Interview 20 essential Q&A Updated 2026
SORT

SORT & DeepSORT: 20 Essential Q&A

A fast MOT baseline—Kalman + Hungarian—and the Re-ID upgrade for crowded scenes.

~11 min read 20 questions Advanced
HungarianKalmancosinecascade
1 What is SORT? 📊 medium
Answer: Simple online MOT: Kalman filter motion model + Hungarian assignment with IoU cost between predicted and detected boxes—very fast.
2 Steps each frame? 📊 medium
Answer: Predict all tracks → match detections to tracks by IoU → update matched with Kalman measurement → create new tracks for unmatched dets → delete stale tracks.
3 Why Hungarian? ⚡ easy
Answer: Optimal one-to-one assignment minimizing total cost—better than greedy max-IoU for competing hypotheses.
4 Cost matrix? 📊 medium
Answer: Often 1 − IoU or negative IoU with threshold—reject matches below IoU min (no assignment).
5 max_age / min_hits? 📊 medium
Answer: Delete track if unmatched for max_age frames; confirm birth only after min_hits to reduce spurious tracks from false positives.
6 What does DeepSORT add? 🔥 hard
Answer: CNN appearance embedding + cosine distance combined with motion Mahalanobis gate—reduces ID switches when IoU ambiguous.
7 Cosine metric learning? 📊 medium
Answer: Train embedding so same-ID images are closer than different-ID—used with triplet or classification losses on person crops.
8 Cascade matching in DeepSORT? 🔥 hard
Answer: First match high-confidence detections to tracks using appearance+motion; then lower-confidence in second stage—reduces clutter confusion.
9 Mahalanobis gate? 📊 medium
Answer: Reject association if innovation (z − Hx) is unlikely under predicted covariance—filters physically impossible jumps.
10 Descriptor dimension? ⚡ easy
Answer: Typical 128-D L2-normalized vector per detection crop—cosine distance = 1 − dot product.
11 Gallery of features? 📊 medium
Answer: Store recent embeddings per track for matching—manage length to balance memory and adaptability to appearance change.
12 Occlusion? 📊 medium
Answer: IoU fails when overlapping—appearance helps reacquire correct ID after split; still hard in dense crowds.
13 Why fast? ⚡ easy
Answer: Minimal overhead beyond detector—no heavy joint optimization per frame unlike some MHT approaches.
14 What is ByteTrack? 📊 medium
Answer: Also associates low-score detections in a second pass—recovers occluded objects SORT might drop.
15 BoT-SORT? 🔥 hard
Answer: Adds camera motion compensation + improved Re-ID—strong MOTChallenge scores.
16 Dense crowds? 📊 medium
Answer: IoU-only methods degrade—appearance, higher-order models, or transformer MOT help.
17 Train appearance? 📊 medium
Answer: On person re-ID datasets (Market1501, etc.) separate from detector—domain gap to target scene matters.
18 SORT limits? ⚡ easy
Answer: Assumes good detector; IoU association weak under fast motion / low FPS; camera motion not modeled in vanilla SORT.
19 vs joint detectors? 🔥 hard
Answer: TrackFormer / MOTR predict tracks end-to-end—no hand-crafted association but need more data and compute.
20 Production? 📊 medium
Answer: Match detector FPS; batch Re-ID CNN; tune thresholds per scene; log ID switches for QA.

SORT / DeepSORT Cheat Sheet

SORT
  • Kalman + IoU
  • Hungarian
DeepSORT
  • Appearance
  • Cascade
Follow-on
  • ByteTrack

💡 Pro tip: DeepSORT adds appearance when IoU is not enough.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.