Computer Vision Interview 20 essential Q&A Updated 2026
SIFT

SIFT: 20 Essential Q&A

Difference-of-Gaussians, keypoint refinement, and why SIFT dominated matching for years.

~12 min read 20 questions Advanced
DoGoctaves128-Dratio test
1 What is SIFT? ⚡ easy
Answer: Scale-Invariant Feature Transform—detects blob-like keypoints in scale-space and builds a 128-D gradient-orientation histogram descriptor; robust to scale, rotation, moderate viewpoint/lighting.
2 What is Difference of Gaussians (DoG)? 📊 medium
Answer: DoG = G(σ1)−G(σ2) approximates scale-normalized LoG—cheap way to find blob-like structures across scales.
3 What is an octave? 📊 medium
Answer: Series of images downsampled by 2 with several σ levels per octave—covers large scale range efficiently.
4 How are keypoints detected? 🔥 hard
Answer: 3×3×3 neighborhood search for scale-space extrema (max/min) in DoG volume—candidate keypoints.
5 Refinement and edge rejection? 🔥 hard
Answer: Taylor expansion fit for subpixel location and scale; reject low contrast; use Hessian of DoG to reject edge-like unstable peaks (ratio of principal curvatures).
6 Orientation histogram? 📊 medium
Answer: Weighted gradient orientations in neighborhood; peak(s) define canonical rotation—descriptor becomes rotation invariant.
7 How is the descriptor built? 📊 medium
Answer: 16×16 window into 4×4 cells; each cell has 8-bin orientation histogram of gradients; 4×4×8 = 128 values, normalized.
8 Why 4×4 grid? ⚡ easy
Answer: Balances spatial layout (localization) vs distinctiveness; finer grid more sensitive to deformation.
9 Why normalize twice? 📊 medium
Answer: L2 normalize, clip large values to reduce illumination dominance, renormalize—improves robustness to affine lighting.
10 What is RootSIFT? 📊 medium
Answer: Apply square root to L1-normalized SIFT then L2 normalize—uses Hellinger kernel implicitly; often improves retrieval.
11 SIFT invariances? 📊 medium
Answer: Scale + rotation; approximate affine with dominant orientation; not fully viewpoint invariant for strong 3D perspective.
12 SIFT vs ORB speed? ⚡ easy
Answer: SIFT heavier (float descriptor, pyramid DoG); ORB binary + FAST—ORB much faster on embedded/CPU.
13 SIFT patents? ⚡ easy
Answer: Were encumbered in US until expired (~2020); OpenCV contrib had nonfree flag—now widely usable.
14 Typical matching? 📊 medium
Answer: L2 or cosine on float vectors; ratio test + RANSAC for geometry.
15 Contrast threshold? ⚡ easy
Answer: Filters weak DoG extrema—reduces unstable keypoints on flat noise.
16 Why DoG approximates LoG? 📊 medium
Answer: Mathematical identity: DoG with σ ratio ~√2 approximates σ²∇²G up to scale—cheap blob detector.
17 Color SIFT? 🔥 hard
Answer: Compute SIFT on color channels or opponent color spaces for extra discriminability—more dimensions or fused descriptors.
18 PCA-SIFT? 🔥 hard
Answer: Project gradient patch to lower-dim PCA basis—smaller descriptor; less common now than vanilla SIFT or learned features.
19 OpenCV? ⚡ easy
Answer: SIFT_create() in cv2 (main module after patent expiry); returns keypoints + descriptors.
20 Limitations? 📊 medium
Answer: Computation cost, repetitive texture ambiguities, limited with strong motion blur or specular highlights—deep features may win with data.

SIFT Cheat Sheet

Detect
  • DoG extrema
  • Subpixel + reject
Describe
  • 4×4 × 8 orient
  • Normalize ×2
Match
  • L2 + ratio
  • RANSAC

💡 Pro tip: DoG finds scale; orientation hist fixes rotation; 128-D is spatial pooling of gradients.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.