Detection vs tracking
Detection classifies and localizes all objects each frame—independent of history. Tracking exploits temporal continuity: prediction from the previous frame reduces search cost and stabilizes identity. Hybrid pipelines run a detector every N frames and a cheap tracker in between, or fuse detections with Kalman prediction and Hungarian matching.
Single-object
One initialized box; tracker updates each frame—OpenCV Tracker* API.
Multi-object (MOT)
Many IDs; needs association to match detections to trajectories across frames.
OpenCV: CSRT tracker (example)
CSRT (Channel and Spatial Reliability) is accurate but slower than KCF. On OpenCV 4.x, legacy trackers often live under cv2.legacy.
import cv2
cap = cv2.VideoCapture("clip.mp4")
ok, frame = cap.read()
bbox = cv2.selectROI("ROI", frame, showCrosshair=True, fromCenter=False)
cv2.destroyWindow("ROI")
tracker = cv2.legacy.TrackerCSRT_create()
tracker.init(frame, bbox)
while True:
ok, frame = cap.read()
if not ok:
break
ok, bbox = tracker.update(frame)
if ok:
x, y, w, h = [int(v) for v in bbox]
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow("track", frame)
if cv2.waitKey(1) == 27:
break
If cv2.legacy is missing, try cv2.TrackerCSRT_create() on older builds, or install opencv-contrib-python.
Faster option: KCF
tracker = cv2.legacy.TrackerKCF_create()
tracker.init(frame, bbox)
KCF is faster; CSRT handles deformation and occlusion slightly better. MOSSE is older and very fast but brittle on scale change.
When trackers fail
- Drift — model updates on wrong pixels; use conservative learning rates or stop updating on low confidence.
- Occlusion / motion blur — switch to detection-based re-id (DeepSORT) or manual re-init.
- Scale / out-of-plane rotation — use scale-pyramid extensions or bounding-box regression from a detector.
MIL tracker (brief)
MIL (Multiple Instance Learning) treats ambiguous positive bags of patches inside the box—more robust to slight misalignment than naive correlation trackers. Create with cv2.legacy.TrackerMIL_create() where available.
Takeaways
- Classic OpenCV trackers = single-object, short-term, init once.
- For many objects + IDs, combine a detector with SORT / DeepSORT.
- Profile CSRT vs KCF on your resolution and FPS budget.
Quick FAQ
init with its ROI, and call update per frame. For consistent IDs across occlusions, prefer detection + association (next chapters).cv2.meanShift, CamShift) work on controlled color distributions; modern pipelines usually prefer learned trackers or detectors.