Local Feature Detectors

Harris corner detector

Idea: structure tensor

Let I_x, I_y be image gradients. Over a window, form the second-moment matrix M from summed outer products of gradients. Its eigenvalues λ₁, λ₂ describe edge strength in two orthogonal directions: both large → corner; one large → edge; both small → flat. Harris uses the response R = det(M) − k·trace(M)² (with empirical k ≈ 0.04–0.06) to avoid explicit eigenvalue decomposition.

`blockSize`

Neighborhood size for summing gradients (odd integer). Larger → smoother response, fewer duplicate peaks.

`ksize`

Sobel aperture for computing I_x, I_y (e.g. 3).

`k`

Harris free parameter in the response; typical range 0.04–0.06. Too small → more edge responses.

`cv2.cornerHarris`

Input must be float32 grayscale. Output is a single-channel float response map; threshold and take local maxima to list corners.

import cv2
import numpy as np

gray = cv2.imread("checkerboard.png", cv2.IMREAD_GRAYSCALE)
gray_f = np.float32(gray)

block, ksz, k = 3, 3, 0.04
resp = cv2.cornerHarris(gray_f, block, ksz, k)

# Dilate to help local-max suppression in a simple way
resp_d = cv2.dilate(resp, None)
vis = np.zeros_like(gray)
vis[resp_d > 0.01 * resp_d.max()] = 255

Stricter threshold

thresh = 0.05 * resp.max()
mask = resp > thresh
# optional: keep only local maxima of `resp` on `mask` with further NMS

Sub-pixel refinement

cornerSubPix refines corner locations to sub-pixel accuracy using the local intensity pattern—useful for calibration, stitching, and metrology.

import cv2
import numpy as np

gray = cv2.imread("grid.jpg", cv2.IMREAD_GRAYSCALE)
gray_f = np.float32(gray)
resp = cv2.cornerHarris(gray_f, 3, 3, 0.04)
yxs = np.argwhere(resp > 0.01 * resp.max()).astype(np.float32)
# cornerSubPix expects shape (N, 1, 2) with (x, y) order
pts = np.zeros((len(yxs), 1, 2), dtype=np.float32)
pts[:, 0, 0] = yxs[:, 1]
pts[:, 0, 1] = yxs[:, 0]

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 40, 0.001)
refined = cv2.cornerSubPix(gray, pts, (5, 5), (-1, -1), criteria)

Harris vs Shi-Tomasi

goodFeaturesToTrack implements the Shi-Tomasi criterion (minimum eigenvalue threshold)—often picks a cleaner set of points for tracking. Harris gives a dense response map you threshold yourself; Shi-Tomasi returns a capped list sorted by quality.

corners = cv2.goodFeaturesToTrack(
    gray, maxCorners=100, qualityLevel=0.01, minDistance=10,
    blockSize=3, useHarrisDetector=True, k=0.04)

corners_st = cv2.goodFeaturesToTrack(
    gray, maxCorners=100, qualityLevel=0.01, minDistance=10, blockSize=3)

                    Takeaways
                    Use float32 input; scale threshold relative to resp.max().
Increase blockSize to suppress duplicate corners on thick edges.
Use cornerSubPix when you need accurate coordinates, not just detection.

                

Quick FAQ

Raise the response threshold, increase blockSize, or pre-blur slightly. Alternatively switch to Shi-Tomasi with minDistance and a lower maxCorners.

No—same corner at different zooms moves in scale space. Use SIFT/ORB or a multi-scale Harris pyramid if you need scale robustness.

SIFT

Pipeline in brief

Build a scale space with Gaussian blur at multiple scales per octave.
Take Difference of Gaussians (DoG); find 3D extrema (x, y, scale).
Refine location, discard low-contrast and edge-like points.
Assign dominant orientation from gradient histograms.
Sample a canonical 16×16 neighborhood into orientation histograms → 128 floats per keypoint.

Descriptor distance

Use Euclidean (L2) or L1; BFMatcher with NORM_L2 is the usual baseline.

When to prefer SIFT

Texture-rich scenes, moderate viewpoint change, when ORB struggles with repeatability.

`SIFT_create` and `detectAndCompute`

import cv2

gray = cv2.imread("building.jpg", cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create(nfeatures=500, nOctaveLayers=3, contrastThreshold=0.04,
                       edgeThreshold=10, sigma=1.6)
kp, des = sift.detectAndCompute(gray, None)

print(len(kp), None if des is None else des.shape)

contrastThreshold ↑ → fewer weak keypoints. edgeThreshold ↑ → more points along elongated structures.

Detect only, then compute

kp = sift.detect(gray, None)
kp, des = sift.compute(gray, kp)

Brute-force L2 matching + ratio test

import cv2

im1 = cv2.imread("a.jpg", cv2.IMREAD_GRAYSCALE)
im2 = cv2.imread("b.jpg", cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create()
k1, d1 = sift.detectAndCompute(im1, None)
k2, d2 = sift.detectAndCompute(im2, None)

bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=False)
pairs = bf.knnMatch(d1, d2, k=2)

good = []
for pair in pairs:
    if len(pair) < 2:
        continue
    m, n = pair
    if m.distance < 0.7 * n.distance:
        good.append(m)

vis = cv2.drawMatches(im1, k1, im2, k2, good[:80], None, flags=2)

FLANN for larger descriptor sets

For thousands of keypoints, FLANN can be faster than exhaustive BF matching. Use KD-tree or k-means index parameters tuned to float descriptors.

import cv2
import numpy as np

FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)

d1f = d1.astype(np.float32)
d2f = d2.astype(np.float32)
pairs = flann.knnMatch(d1f, d2f, k=2)

Environment notes

If cv2.SIFT_create is missing, install a recent opencv-python (SIFT returned to the main build after the patent expired in many jurisdictions). Some older wheels required opencv-contrib-python. Always check your OpenCV version with print(cv2.__version__).

                    Takeaways
                    SIFT descriptors are float vectors—match with NORM_L2 (or L1).
Use kNN + ratio or geometry (findHomography + RANSAC) to drop outliers.
Heavier than ORB; use FLANN when matching large batches.

                

Quick FAQ

ORB is faster and uses compact binary descriptors; SIFT is often stronger on difficult pairs but costs more CPU and memory. Profile on target hardware.

OpenCV’s SIFT descriptors are already normalized to unit length in typical builds—distance metrics assume that. If you modify vectors, re-normalize before L2 matching.

ORB

What ORB builds on

FAST: compares pixels on a circle around the candidate—very fast corner score.
Orientation: intensity centroid offset gives a main angle per keypoint.
rBRIEF: pairwise intensity tests in a rotated pattern → fixed-length binary string (often 256 bits = 32 bytes in OpenCV).

`ORB_create` parameters

import cv2

orb = cv2.ORB_create(
    nfeatures=1000,
    scaleFactor=1.2,
    nlevels=8,
    edgeThreshold=31,
    firstLevel=0,
    WTA_K=2,
    scoreType=0,
    patchSize=31,
    fastThreshold=20,
)

`scaleFactor`, `nlevels`

Pyramid decimation between levels; more levels → wider scale coverage, more compute.

`fastThreshold`

FAST intensity difference threshold; lower → more corners (noisier).

Two detector score modes

# scoreType: 0 = HARRIS_SCORE (default), 1 = FAST_SCORE
orb_harris = cv2.ORB_create(scoreType=0)
orb_fast = cv2.ORB_create(scoreType=1)

Harris scoring re-ranks FAST corners for stability; FAST-only is a bit cheaper.

Detect, compute, Hamming match

import cv2

g1 = cv2.imread("view1.jpg", cv2.IMREAD_GRAYSCALE)
g2 = cv2.imread("view2.jpg", cv2.IMREAD_GRAYSCALE)
orb = cv2.ORB_create(800)
k1, d1 = orb.detectAndCompute(g1, None)
k2, d2 = orb.detectAndCompute(g2, None)

bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = sorted(bf.match(d1, d2), key=lambda m: m.distance)
vis = cv2.drawMatches(g1, k1, g2, k2, matches[:60], None, flags=2)

kNN + ratio (often stronger than crossCheck alone)

bf2 = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
pairs = bf2.knnMatch(d1, d2, k=2)
good = [m for m, n in pairs if m.distance < 0.75 * n.distance]

Homography from ORB matches

After matching, use RANSAC to estimate a plane-to-plane map—starter for panoramas or planar object detection.

import cv2
import numpy as np

if len(good) >= 4:
    pts1 = np.float32([k1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
    pts2 = np.float32([k2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
    H, mask = cv2.findHomography(pts1, pts2, cv2.RANSAC, 5.0)
    # H maps points from image 1 to image 2 (convention: check OpenCV docs)

ORB vs SIFT (quick)

ORB: integer-friendly, smaller memory, Hamming match, faster. SIFT: float 128-D, usually more repeatable under hard viewpoint/scale changes, slower. Pick ORB for speed-first; fall back to SIFT when recall matters more.

                    Takeaways
                    Always match ORB with NORM_HAMMING (or NORM_HAMMING2 if using WTA_K=3 or 4).
Tune fastThreshold and nfeatures together for density vs speed.
Combine ratio test + findHomography RANSAC for geometrically consistent matches.

                

Quick FAQ

FAST is sensitive to ordering and noise; JPEG artifacts and exposure shifts move scores. For repeatability tests, use the same decode path and mild blur.

Convert to grayscale before detectAndCompute unless you use a custom pipeline; ORB in OpenCV expects single-channel input.

Chapter FAQ

Quick FAQ

Raise the response threshold, increase blockSize, or pre-blur slightly. Alternatively switch to Shi-Tomasi with minDistance and a lower maxCorners.

No—same corner at different zooms moves in scale space. Use SIFT/ORB or a multi-scale Harris pyramid if you need scale robustness.

Quick FAQ

ORB is faster and uses compact binary descriptors; SIFT is often stronger on difficult pairs but costs more CPU and memory. Profile on target hardware.

OpenCV’s SIFT descriptors are already normalized to unit length in typical builds—distance metrics assume that. If you modify vectors, re-normalize before L2 matching.

Harris corner detector

Idea: structure tensor

blockSize

ksize

k

cv2.cornerHarris

Stricter threshold

Sub-pixel refinement

Harris vs Shi-Tomasi

Takeaways

Quick FAQ

Too many detections on textured areas?

Is Harris scale-invariant?

SIFT

Pipeline in brief

Descriptor distance

When to prefer SIFT

SIFT_create and detectAndCompute

Detect only, then compute

Brute-force L2 matching + ratio test

FLANN for larger descriptor sets

Environment notes

Takeaways

Quick FAQ

SIFT vs ORB for a mobile app?

Normalize descriptors?

ORB

What ORB builds on

ORB_create parameters

scaleFactor, nlevels

fastThreshold

Two detector score modes

Detect, compute, Hamming match

kNN + ratio (often stronger than crossCheck alone)

Homography from ORB matches

ORB vs SIFT (quick)

Takeaways

Quick FAQ

Why identical scenes give different keypoints?

Color images?

Chapter FAQ

Quick FAQ

Too many detections on textured areas?

Is Harris scale-invariant?

Quick FAQ

SIFT vs ORB for a mobile app?

Normalize descriptors?

`blockSize`

`ksize`

`k`

`cv2.cornerHarris`

`SIFT_create` and `detectAndCompute`

`ORB_create` parameters

`scaleFactor`, `nlevels`

`fastThreshold`