Computer Vision Chapter 5

Local Feature Detectors

Harris corners, SIFT, and ORB—scale and rotation aware keypoints for matching and tracking.

Harris corner detector

Idea: structure tensor

Let Ix, Iy be image gradients. Over a window, form the second-moment matrix M from summed outer products of gradients. Its eigenvalues λ₁, λ₂ describe edge strength in two orthogonal directions: both large → corner; one large → edge; both small → flat. Harris uses the response R = det(M) − k·trace(M)² (with empirical k ≈ 0.04–0.06) to avoid explicit eigenvalue decomposition.

blockSize

Neighborhood size for summing gradients (odd integer). Larger → smoother response, fewer duplicate peaks.

ksize

Sobel aperture for computing Ix, Iy (e.g. 3).

k

Harris free parameter in the response; typical range 0.04–0.06. Too small → more edge responses.

cv2.cornerHarris

Input must be float32 grayscale. Output is a single-channel float response map; threshold and take local maxima to list corners.

import cv2
import numpy as np

gray = cv2.imread("checkerboard.png", cv2.IMREAD_GRAYSCALE)
gray_f = np.float32(gray)

block, ksz, k = 3, 3, 0.04
resp = cv2.cornerHarris(gray_f, block, ksz, k)

# Dilate to help local-max suppression in a simple way
resp_d = cv2.dilate(resp, None)
vis = np.zeros_like(gray)
vis[resp_d > 0.01 * resp_d.max()] = 255

Stricter threshold

thresh = 0.05 * resp.max()
mask = resp > thresh
# optional: keep only local maxima of `resp` on `mask` with further NMS

Sub-pixel refinement

cornerSubPix refines corner locations to sub-pixel accuracy using the local intensity pattern—useful for calibration, stitching, and metrology.

import cv2
import numpy as np

gray = cv2.imread("grid.jpg", cv2.IMREAD_GRAYSCALE)
gray_f = np.float32(gray)
resp = cv2.cornerHarris(gray_f, 3, 3, 0.04)
yxs = np.argwhere(resp > 0.01 * resp.max()).astype(np.float32)
# cornerSubPix expects shape (N, 1, 2) with (x, y) order
pts = np.zeros((len(yxs), 1, 2), dtype=np.float32)
pts[:, 0, 0] = yxs[:, 1]
pts[:, 0, 1] = yxs[:, 0]

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 40, 0.001)
refined = cv2.cornerSubPix(gray, pts, (5, 5), (-1, -1), criteria)

Harris vs Shi-Tomasi

goodFeaturesToTrack implements the Shi-Tomasi criterion (minimum eigenvalue threshold)—often picks a cleaner set of points for tracking. Harris gives a dense response map you threshold yourself; Shi-Tomasi returns a capped list sorted by quality.

corners = cv2.goodFeaturesToTrack(
    gray, maxCorners=100, qualityLevel=0.01, minDistance=10,
    blockSize=3, useHarrisDetector=True, k=0.04)

corners_st = cv2.goodFeaturesToTrack(
    gray, maxCorners=100, qualityLevel=0.01, minDistance=10, blockSize=3)

Takeaways

  • Use float32 input; scale threshold relative to resp.max().
  • Increase blockSize to suppress duplicate corners on thick edges.
  • Use cornerSubPix when you need accurate coordinates, not just detection.

Quick FAQ

Raise the response threshold, increase blockSize, or pre-blur slightly. Alternatively switch to Shi-Tomasi with minDistance and a lower maxCorners.

No—same corner at different zooms moves in scale space. Use SIFT/ORB or a multi-scale Harris pyramid if you need scale robustness.

SIFT

Pipeline in brief

  1. Build a scale space with Gaussian blur at multiple scales per octave.
  2. Take Difference of Gaussians (DoG); find 3D extrema (x, y, scale).
  3. Refine location, discard low-contrast and edge-like points.
  4. Assign dominant orientation from gradient histograms.
  5. Sample a canonical 16×16 neighborhood into orientation histograms → 128 floats per keypoint.

Descriptor distance

Use Euclidean (L2) or L1; BFMatcher with NORM_L2 is the usual baseline.

When to prefer SIFT

Texture-rich scenes, moderate viewpoint change, when ORB struggles with repeatability.

SIFT_create and detectAndCompute

import cv2

gray = cv2.imread("building.jpg", cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create(nfeatures=500, nOctaveLayers=3, contrastThreshold=0.04,
                       edgeThreshold=10, sigma=1.6)
kp, des = sift.detectAndCompute(gray, None)

print(len(kp), None if des is None else des.shape)

contrastThreshold ↑ → fewer weak keypoints. edgeThreshold ↑ → more points along elongated structures.

Detect only, then compute

kp = sift.detect(gray, None)
kp, des = sift.compute(gray, kp)

Brute-force L2 matching + ratio test

import cv2

im1 = cv2.imread("a.jpg", cv2.IMREAD_GRAYSCALE)
im2 = cv2.imread("b.jpg", cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create()
k1, d1 = sift.detectAndCompute(im1, None)
k2, d2 = sift.detectAndCompute(im2, None)

bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=False)
pairs = bf.knnMatch(d1, d2, k=2)

good = []
for pair in pairs:
    if len(pair) < 2:
        continue
    m, n = pair
    if m.distance < 0.7 * n.distance:
        good.append(m)

vis = cv2.drawMatches(im1, k1, im2, k2, good[:80], None, flags=2)

FLANN for larger descriptor sets

For thousands of keypoints, FLANN can be faster than exhaustive BF matching. Use KD-tree or k-means index parameters tuned to float descriptors.

import cv2
import numpy as np

FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)

d1f = d1.astype(np.float32)
d2f = d2.astype(np.float32)
pairs = flann.knnMatch(d1f, d2f, k=2)

Environment notes

If cv2.SIFT_create is missing, install a recent opencv-python (SIFT returned to the main build after the patent expired in many jurisdictions). Some older wheels required opencv-contrib-python. Always check your OpenCV version with print(cv2.__version__).

Takeaways

  • SIFT descriptors are float vectors—match with NORM_L2 (or L1).
  • Use kNN + ratio or geometry (findHomography + RANSAC) to drop outliers.
  • Heavier than ORB; use FLANN when matching large batches.

Quick FAQ

ORB is faster and uses compact binary descriptors; SIFT is often stronger on difficult pairs but costs more CPU and memory. Profile on target hardware.

OpenCV’s SIFT descriptors are already normalized to unit length in typical builds—distance metrics assume that. If you modify vectors, re-normalize before L2 matching.

ORB

What ORB builds on

  • FAST: compares pixels on a circle around the candidate—very fast corner score.
  • Orientation: intensity centroid offset gives a main angle per keypoint.
  • rBRIEF: pairwise intensity tests in a rotated pattern → fixed-length binary string (often 256 bits = 32 bytes in OpenCV).

ORB_create parameters

import cv2

orb = cv2.ORB_create(
    nfeatures=1000,
    scaleFactor=1.2,
    nlevels=8,
    edgeThreshold=31,
    firstLevel=0,
    WTA_K=2,
    scoreType=0,
    patchSize=31,
    fastThreshold=20,
)

scaleFactor, nlevels

Pyramid decimation between levels; more levels → wider scale coverage, more compute.

fastThreshold

FAST intensity difference threshold; lower → more corners (noisier).

Two detector score modes

# scoreType: 0 = HARRIS_SCORE (default), 1 = FAST_SCORE
orb_harris = cv2.ORB_create(scoreType=0)
orb_fast = cv2.ORB_create(scoreType=1)

Harris scoring re-ranks FAST corners for stability; FAST-only is a bit cheaper.

Detect, compute, Hamming match

import cv2

g1 = cv2.imread("view1.jpg", cv2.IMREAD_GRAYSCALE)
g2 = cv2.imread("view2.jpg", cv2.IMREAD_GRAYSCALE)
orb = cv2.ORB_create(800)
k1, d1 = orb.detectAndCompute(g1, None)
k2, d2 = orb.detectAndCompute(g2, None)

bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = sorted(bf.match(d1, d2), key=lambda m: m.distance)
vis = cv2.drawMatches(g1, k1, g2, k2, matches[:60], None, flags=2)

kNN + ratio (often stronger than crossCheck alone)

bf2 = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
pairs = bf2.knnMatch(d1, d2, k=2)
good = [m for m, n in pairs if m.distance < 0.75 * n.distance]

Homography from ORB matches

After matching, use RANSAC to estimate a plane-to-plane map—starter for panoramas or planar object detection.

import cv2
import numpy as np

if len(good) >= 4:
    pts1 = np.float32([k1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
    pts2 = np.float32([k2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
    H, mask = cv2.findHomography(pts1, pts2, cv2.RANSAC, 5.0)
    # H maps points from image 1 to image 2 (convention: check OpenCV docs)

ORB vs SIFT (quick)

ORB: integer-friendly, smaller memory, Hamming match, faster. SIFT: float 128-D, usually more repeatable under hard viewpoint/scale changes, slower. Pick ORB for speed-first; fall back to SIFT when recall matters more.

Takeaways

  • Always match ORB with NORM_HAMMING (or NORM_HAMMING2 if using WTA_K=3 or 4).
  • Tune fastThreshold and nfeatures together for density vs speed.
  • Combine ratio test + findHomography RANSAC for geometrically consistent matches.

Quick FAQ

FAST is sensitive to ordering and noise; JPEG artifacts and exposure shifts move scores. For repeatability tests, use the same decode path and mild blur.

Convert to grayscale before detectAndCompute unless you use a custom pipeline; ORB in OpenCV expects single-channel input.

Chapter FAQ

Quick FAQ

Raise the response threshold, increase blockSize, or pre-blur slightly. Alternatively switch to Shi-Tomasi with minDistance and a lower maxCorners.

No—same corner at different zooms moves in scale space. Use SIFT/ORB or a multi-scale Harris pyramid if you need scale robustness.

Quick FAQ

ORB is faster and uses compact binary descriptors; SIFT is often stronger on difficult pairs but costs more CPU and memory. Profile on target hardware.

OpenCV’s SIFT descriptors are already normalized to unit length in typical builds—distance metrics assume that. If you modify vectors, re-normalize before L2 matching.