Computer Vision Interview
20 essential Q&A
Updated 2026
SORT
SORT & DeepSORT: 20 Essential Q&A
A fast MOT baseline—Kalman + Hungarian—and the Re-ID upgrade for crowded scenes.
~11 min read
20 questions
Advanced
HungarianKalmancosinecascade
Quick Navigation
1
What is SORT?
📊 medium
Answer: Simple online MOT: Kalman filter motion model + Hungarian assignment with IoU cost between predicted and detected boxes—very fast.
2
Steps each frame?
📊 medium
Answer: Predict all tracks → match detections to tracks by IoU → update matched with Kalman measurement → create new tracks for unmatched dets → delete stale tracks.
3
Why Hungarian?
⚡ easy
Answer: Optimal one-to-one assignment minimizing total cost—better than greedy max-IoU for competing hypotheses.
4
Cost matrix?
📊 medium
Answer: Often 1 − IoU or negative IoU with threshold—reject matches below IoU min (no assignment).
5
max_age / min_hits?
📊 medium
Answer: Delete track if unmatched for max_age frames; confirm birth only after min_hits to reduce spurious tracks from false positives.
6
What does DeepSORT add?
🔥 hard
Answer: CNN appearance embedding + cosine distance combined with motion Mahalanobis gate—reduces ID switches when IoU ambiguous.
7
Cosine metric learning?
📊 medium
Answer: Train embedding so same-ID images are closer than different-ID—used with triplet or classification losses on person crops.
8
Cascade matching in DeepSORT?
🔥 hard
Answer: First match high-confidence detections to tracks using appearance+motion; then lower-confidence in second stage—reduces clutter confusion.
9
Mahalanobis gate?
📊 medium
Answer: Reject association if innovation (z − Hx) is unlikely under predicted covariance—filters physically impossible jumps.
10
Descriptor dimension?
⚡ easy
Answer: Typical 128-D L2-normalized vector per detection crop—cosine distance = 1 − dot product.
11
Gallery of features?
📊 medium
Answer: Store recent embeddings per track for matching—manage length to balance memory and adaptability to appearance change.
12
Occlusion?
📊 medium
Answer: IoU fails when overlapping—appearance helps reacquire correct ID after split; still hard in dense crowds.
13
Why fast?
⚡ easy
Answer: Minimal overhead beyond detector—no heavy joint optimization per frame unlike some MHT approaches.
14
What is ByteTrack?
📊 medium
Answer: Also associates low-score detections in a second pass—recovers occluded objects SORT might drop.
15
BoT-SORT?
🔥 hard
Answer: Adds camera motion compensation + improved Re-ID—strong MOTChallenge scores.
16
Dense crowds?
📊 medium
Answer: IoU-only methods degrade—appearance, higher-order models, or transformer MOT help.
17
Train appearance?
📊 medium
Answer: On person re-ID datasets (Market1501, etc.) separate from detector—domain gap to target scene matters.
18
SORT limits?
⚡ easy
Answer: Assumes good detector; IoU association weak under fast motion / low FPS; camera motion not modeled in vanilla SORT.
19
vs joint detectors?
🔥 hard
Answer: TrackFormer / MOTR predict tracks end-to-end—no hand-crafted association but need more data and compute.
20
Production?
📊 medium
Answer: Match detector FPS; batch Re-ID CNN; tune thresholds per scene; log ID switches for QA.
SORT / DeepSORT Cheat Sheet
SORT
- Kalman + IoU
- Hungarian
DeepSORT
- Appearance
- Cascade
Follow-on
- ByteTrack
💡 Pro tip: DeepSORT adds appearance when IoU is not enough.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.