Computer Vision Interview
20 essential Q&A
Updated 2026
SIFT
SIFT: 20 Essential Q&A
Difference-of-Gaussians, keypoint refinement, and why SIFT dominated matching for years.
~12 min read
20 questions
Advanced
DoGoctaves128-Dratio test
Quick Navigation
1
What is SIFT?
⚡ easy
Answer: Scale-Invariant Feature Transform—detects blob-like keypoints in scale-space and builds a 128-D gradient-orientation histogram descriptor; robust to scale, rotation, moderate viewpoint/lighting.
2
What is Difference of Gaussians (DoG)?
📊 medium
Answer: DoG = G(σ1)−G(σ2) approximates scale-normalized LoG—cheap way to find blob-like structures across scales.
3
What is an octave?
📊 medium
Answer: Series of images downsampled by 2 with several σ levels per octave—covers large scale range efficiently.
4
How are keypoints detected?
🔥 hard
Answer: 3×3×3 neighborhood search for scale-space extrema (max/min) in DoG volume—candidate keypoints.
5
Refinement and edge rejection?
🔥 hard
Answer: Taylor expansion fit for subpixel location and scale; reject low contrast; use Hessian of DoG to reject edge-like unstable peaks (ratio of principal curvatures).
6
Orientation histogram?
📊 medium
Answer: Weighted gradient orientations in neighborhood; peak(s) define canonical rotation—descriptor becomes rotation invariant.
7
How is the descriptor built?
📊 medium
Answer: 16×16 window into 4×4 cells; each cell has 8-bin orientation histogram of gradients; 4×4×8 = 128 values, normalized.
8
Why 4×4 grid?
⚡ easy
Answer: Balances spatial layout (localization) vs distinctiveness; finer grid more sensitive to deformation.
9
Why normalize twice?
📊 medium
Answer: L2 normalize, clip large values to reduce illumination dominance, renormalize—improves robustness to affine lighting.
10
What is RootSIFT?
📊 medium
Answer: Apply square root to L1-normalized SIFT then L2 normalize—uses Hellinger kernel implicitly; often improves retrieval.
11
SIFT invariances?
📊 medium
Answer: Scale + rotation; approximate affine with dominant orientation; not fully viewpoint invariant for strong 3D perspective.
12
SIFT vs ORB speed?
⚡ easy
Answer: SIFT heavier (float descriptor, pyramid DoG); ORB binary + FAST—ORB much faster on embedded/CPU.
13
SIFT patents?
⚡ easy
Answer: Were encumbered in US until expired (~2020); OpenCV contrib had nonfree flag—now widely usable.
14
Typical matching?
📊 medium
Answer: L2 or cosine on float vectors; ratio test + RANSAC for geometry.
15
Contrast threshold?
⚡ easy
Answer: Filters weak DoG extrema—reduces unstable keypoints on flat noise.
16
Why DoG approximates LoG?
📊 medium
Answer: Mathematical identity: DoG with σ ratio ~√2 approximates σ²∇²G up to scale—cheap blob detector.
17
Color SIFT?
🔥 hard
Answer: Compute SIFT on color channels or opponent color spaces for extra discriminability—more dimensions or fused descriptors.
18
PCA-SIFT?
🔥 hard
Answer: Project gradient patch to lower-dim PCA basis—smaller descriptor; less common now than vanilla SIFT or learned features.
19
OpenCV?
⚡ easy
Answer:
SIFT_create() in cv2 (main module after patent expiry); returns keypoints + descriptors.
20
Limitations?
📊 medium
Answer: Computation cost, repetitive texture ambiguities, limited with strong motion blur or specular highlights—deep features may win with data.
SIFT Cheat Sheet
Detect
- DoG extrema
- Subpixel + reject
Describe
- 4×4 × 8 orient
- Normalize ×2
Match
- L2 + ratio
- RANSAC
💡 Pro tip: DoG finds scale; orientation hist fixes rotation; 128-D is spatial pooling of gradients.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.