Computer Vision Chapter 22

Object tracking basics

Visual object tracking estimates the state of a target (usually a bounding box or mask) across video frames given an initialization. Short-term trackers assume the target stays visible and appearance changes gradually. Long-term or tracking-by-detection systems re-acquire targets after occlusion using a detector plus data association (SORT, DeepSORT—next chapters). OpenCV ships classic correlation-filter and MIL trackers suitable for prototypes and learning.

Detection vs tracking

Detection classifies and localizes all objects each frame—independent of history. Tracking exploits temporal continuity: prediction from the previous frame reduces search cost and stabilizes identity. Hybrid pipelines run a detector every N frames and a cheap tracker in between, or fuse detections with Kalman prediction and Hungarian matching.

Single-object

One initialized box; tracker updates each frame—OpenCV Tracker* API.

Multi-object (MOT)

Many IDs; needs association to match detections to trajectories across frames.

OpenCV: CSRT tracker (example)

CSRT (Channel and Spatial Reliability) is accurate but slower than KCF. On OpenCV 4.x, legacy trackers often live under cv2.legacy.

import cv2

cap = cv2.VideoCapture("clip.mp4")
ok, frame = cap.read()
bbox = cv2.selectROI("ROI", frame, showCrosshair=True, fromCenter=False)
cv2.destroyWindow("ROI")

tracker = cv2.legacy.TrackerCSRT_create()
tracker.init(frame, bbox)

while True:
    ok, frame = cap.read()
    if not ok:
        break
    ok, bbox = tracker.update(frame)
    if ok:
        x, y, w, h = [int(v) for v in bbox]
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
    cv2.imshow("track", frame)
    if cv2.waitKey(1) == 27:
        break

If cv2.legacy is missing, try cv2.TrackerCSRT_create() on older builds, or install opencv-contrib-python.

Faster option: KCF

tracker = cv2.legacy.TrackerKCF_create()
tracker.init(frame, bbox)

KCF is faster; CSRT handles deformation and occlusion slightly better. MOSSE is older and very fast but brittle on scale change.

When trackers fail

  • Drift — model updates on wrong pixels; use conservative learning rates or stop updating on low confidence.
  • Occlusion / motion blur — switch to detection-based re-id (DeepSORT) or manual re-init.
  • Scale / out-of-plane rotation — use scale-pyramid extensions or bounding-box regression from a detector.

MIL tracker (brief)

MIL (Multiple Instance Learning) treats ambiguous positive bags of patches inside the box—more robust to slight misalignment than naive correlation trackers. Create with cv2.legacy.TrackerMIL_create() where available.

Takeaways

  • Classic OpenCV trackers = single-object, short-term, init once.
  • For many objects + IDs, combine a detector with SORT / DeepSORT.
  • Profile CSRT vs KCF on your resolution and FPS budget.

Quick FAQ

Maintain a list of tracker instances, each init with its ROI, and call update per frame. For consistent IDs across occlusions, prefer detection + association (next chapters).

Color-histogram modes in OpenCV (cv2.meanShift, CamShift) work on controlled color distributions; modern pipelines usually prefer learned trackers or detectors.