Harris corner detector
Idea: structure tensor
Let Ix, Iy be image gradients. Over a window, form the second-moment matrix M from summed outer products of gradients. Its eigenvalues λ₁, λ₂ describe edge strength in two orthogonal directions: both large → corner; one large → edge; both small → flat. Harris uses the response R = det(M) − k·trace(M)² (with empirical k ≈ 0.04–0.06) to avoid explicit eigenvalue decomposition.
blockSize
Neighborhood size for summing gradients (odd integer). Larger → smoother response, fewer duplicate peaks.
ksize
Sobel aperture for computing Ix, Iy (e.g. 3).
k
Harris free parameter in the response; typical range 0.04–0.06. Too small → more edge responses.
cv2.cornerHarris
Input must be float32 grayscale. Output is a single-channel float response map; threshold and take local maxima to list corners.
import cv2
import numpy as np
gray = cv2.imread("checkerboard.png", cv2.IMREAD_GRAYSCALE)
gray_f = np.float32(gray)
block, ksz, k = 3, 3, 0.04
resp = cv2.cornerHarris(gray_f, block, ksz, k)
# Dilate to help local-max suppression in a simple way
resp_d = cv2.dilate(resp, None)
vis = np.zeros_like(gray)
vis[resp_d > 0.01 * resp_d.max()] = 255
Stricter threshold
thresh = 0.05 * resp.max()
mask = resp > thresh
# optional: keep only local maxima of `resp` on `mask` with further NMS
Sub-pixel refinement
cornerSubPix refines corner locations to sub-pixel accuracy using the local intensity pattern—useful for calibration, stitching, and metrology.
import cv2
import numpy as np
gray = cv2.imread("grid.jpg", cv2.IMREAD_GRAYSCALE)
gray_f = np.float32(gray)
resp = cv2.cornerHarris(gray_f, 3, 3, 0.04)
yxs = np.argwhere(resp > 0.01 * resp.max()).astype(np.float32)
# cornerSubPix expects shape (N, 1, 2) with (x, y) order
pts = np.zeros((len(yxs), 1, 2), dtype=np.float32)
pts[:, 0, 0] = yxs[:, 1]
pts[:, 0, 1] = yxs[:, 0]
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 40, 0.001)
refined = cv2.cornerSubPix(gray, pts, (5, 5), (-1, -1), criteria)
Harris vs Shi-Tomasi
goodFeaturesToTrack implements the Shi-Tomasi criterion (minimum eigenvalue threshold)—often picks a cleaner set of points for tracking. Harris gives a dense response map you threshold yourself; Shi-Tomasi returns a capped list sorted by quality.
corners = cv2.goodFeaturesToTrack(
gray, maxCorners=100, qualityLevel=0.01, minDistance=10,
blockSize=3, useHarrisDetector=True, k=0.04)
corners_st = cv2.goodFeaturesToTrack(
gray, maxCorners=100, qualityLevel=0.01, minDistance=10, blockSize=3)
Takeaways
- Use float32 input; scale threshold relative to
resp.max(). - Increase
blockSizeto suppress duplicate corners on thick edges. - Use
cornerSubPixwhen you need accurate coordinates, not just detection.
Quick FAQ
blockSize, or pre-blur slightly. Alternatively switch to Shi-Tomasi with minDistance and a lower maxCorners.SIFT
Pipeline in brief
- Build a scale space with Gaussian blur at multiple scales per octave.
- Take Difference of Gaussians (DoG); find 3D extrema (x, y, scale).
- Refine location, discard low-contrast and edge-like points.
- Assign dominant orientation from gradient histograms.
- Sample a canonical 16×16 neighborhood into orientation histograms → 128 floats per keypoint.
Descriptor distance
Use Euclidean (L2) or L1; BFMatcher with NORM_L2 is the usual baseline.
When to prefer SIFT
Texture-rich scenes, moderate viewpoint change, when ORB struggles with repeatability.
SIFT_create and detectAndCompute
import cv2
gray = cv2.imread("building.jpg", cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create(nfeatures=500, nOctaveLayers=3, contrastThreshold=0.04,
edgeThreshold=10, sigma=1.6)
kp, des = sift.detectAndCompute(gray, None)
print(len(kp), None if des is None else des.shape)
contrastThreshold ↑ → fewer weak keypoints. edgeThreshold ↑ → more points along elongated structures.
Detect only, then compute
kp = sift.detect(gray, None)
kp, des = sift.compute(gray, kp)
Brute-force L2 matching + ratio test
import cv2
im1 = cv2.imread("a.jpg", cv2.IMREAD_GRAYSCALE)
im2 = cv2.imread("b.jpg", cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create()
k1, d1 = sift.detectAndCompute(im1, None)
k2, d2 = sift.detectAndCompute(im2, None)
bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=False)
pairs = bf.knnMatch(d1, d2, k=2)
good = []
for pair in pairs:
if len(pair) < 2:
continue
m, n = pair
if m.distance < 0.7 * n.distance:
good.append(m)
vis = cv2.drawMatches(im1, k1, im2, k2, good[:80], None, flags=2)
FLANN for larger descriptor sets
For thousands of keypoints, FLANN can be faster than exhaustive BF matching. Use KD-tree or k-means index parameters tuned to float descriptors.
import cv2
import numpy as np
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
d1f = d1.astype(np.float32)
d2f = d2.astype(np.float32)
pairs = flann.knnMatch(d1f, d2f, k=2)
Environment notes
If cv2.SIFT_create is missing, install a recent opencv-python (SIFT returned to the main build after the patent expired in many jurisdictions). Some older wheels required opencv-contrib-python. Always check your OpenCV version with print(cv2.__version__).
Takeaways
- SIFT descriptors are float vectors—match with
NORM_L2(or L1). - Use kNN + ratio or geometry (
findHomography+ RANSAC) to drop outliers. - Heavier than ORB; use FLANN when matching large batches.
Quick FAQ
ORB
What ORB builds on
- FAST: compares pixels on a circle around the candidate—very fast corner score.
- Orientation: intensity centroid offset gives a main angle per keypoint.
- rBRIEF: pairwise intensity tests in a rotated pattern → fixed-length binary string (often 256 bits = 32 bytes in OpenCV).
ORB_create parameters
import cv2
orb = cv2.ORB_create(
nfeatures=1000,
scaleFactor=1.2,
nlevels=8,
edgeThreshold=31,
firstLevel=0,
WTA_K=2,
scoreType=0,
patchSize=31,
fastThreshold=20,
)
scaleFactor, nlevels
Pyramid decimation between levels; more levels → wider scale coverage, more compute.
fastThreshold
FAST intensity difference threshold; lower → more corners (noisier).
Two detector score modes
# scoreType: 0 = HARRIS_SCORE (default), 1 = FAST_SCORE
orb_harris = cv2.ORB_create(scoreType=0)
orb_fast = cv2.ORB_create(scoreType=1)
Harris scoring re-ranks FAST corners for stability; FAST-only is a bit cheaper.
Detect, compute, Hamming match
import cv2
g1 = cv2.imread("view1.jpg", cv2.IMREAD_GRAYSCALE)
g2 = cv2.imread("view2.jpg", cv2.IMREAD_GRAYSCALE)
orb = cv2.ORB_create(800)
k1, d1 = orb.detectAndCompute(g1, None)
k2, d2 = orb.detectAndCompute(g2, None)
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = sorted(bf.match(d1, d2), key=lambda m: m.distance)
vis = cv2.drawMatches(g1, k1, g2, k2, matches[:60], None, flags=2)
kNN + ratio (often stronger than crossCheck alone)
bf2 = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
pairs = bf2.knnMatch(d1, d2, k=2)
good = [m for m, n in pairs if m.distance < 0.75 * n.distance]
Homography from ORB matches
After matching, use RANSAC to estimate a plane-to-plane map—starter for panoramas or planar object detection.
import cv2
import numpy as np
if len(good) >= 4:
pts1 = np.float32([k1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
pts2 = np.float32([k2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
H, mask = cv2.findHomography(pts1, pts2, cv2.RANSAC, 5.0)
# H maps points from image 1 to image 2 (convention: check OpenCV docs)
ORB vs SIFT (quick)
ORB: integer-friendly, smaller memory, Hamming match, faster. SIFT: float 128-D, usually more repeatable under hard viewpoint/scale changes, slower. Pick ORB for speed-first; fall back to SIFT when recall matters more.
Takeaways
- Always match ORB with
NORM_HAMMING(orNORM_HAMMING2if using WTA_K=3 or 4). - Tune
fastThresholdandnfeaturestogether for density vs speed. - Combine ratio test +
findHomographyRANSAC for geometrically consistent matches.
Quick FAQ
detectAndCompute unless you use a custom pipeline; ORB in OpenCV expects single-channel input.Chapter FAQ
Quick FAQ
blockSize, or pre-blur slightly. Alternatively switch to Shi-Tomasi with minDistance and a lower maxCorners.