Computer Vision Interview 20 essential Q&A Updated 2026
features

Feature Detection Intro: 20 Essential Q&A

Detectors vs descriptors, invariances, and how classical features support matching and SLAM.

~11 min read 20 questions Intermediate
keypointsdescriptorsmatchinginvariance
1 What is a local image feature? ⚡ easy
Answer: A salient image patch with a keypoint (location, scale, orientation) and often a descriptor vector summarizing local appearance for matching.
2 Difference detector vs descriptor? 📊 medium
Answer: Detector finds stable interest points; descriptor encodes local neighborhood for similarity—can mix (e.g. Harris corners + SIFT descriptor in some pipelines).
3 Why are corners good features? 📊 medium
Answer: High gradient in multiple directions—well localized, repeatable under small viewpoint/light changes vs flat regions or straight edges.
4 What is a blob feature? 📊 medium
Answer: Extremum in scale-space (LoG/DoG)—captures roundish regions; complementary to corners for texture-poor scenes.
5 What is scale invariance? 📊 medium
Answer: Detect+describe at multiple scales or with scale-normalized patch so matching works across zoom—SIFT pyramid, ORB octave pyramid.
6 Achieve rotation invariance? 📊 medium
Answer: Assign dominant orientation from gradient histogram and rotate patch canonical frame—or use rotation-invariant descriptors (some trade distinctiveness).
7 What are affine covariant regions? 🔥 hard
Answer: Regions that deform predictably under affine viewing of planar surfaces—MSER, Harris-Affine family; stronger than similarity for wide baselines.
8 What is NCC template matching? 📊 medium
Answer: Normalized cross-correlation over patches—brightness/contrast normalized SSD-like score; expensive vs sparse keypoints.
9 What is Lowe's ratio test? 📊 medium
Answer: Reject match if distance to nearest neighbor is not sufficiently smaller than second-nearest—reduces ambiguous matches.
10 Mutual nearest neighbor? ⚡ easy
Answer: Accept match only if a is nearest to b and b nearest to a—simple filter for symmetric uniqueness.
11 Why RANSAC after feature matching? 📊 medium
Answer: Estimates geometric model (F/E/H) while rejecting outliers from incorrect matches—essential for robust pose and stitching.
12 What is bag-of-visual-words? 📊 medium
Answer: Quantize descriptors to vocabulary clusters; image → histogram of words—classic image retrieval / classification before deep CNNs.
13 Features in tracking? ⚡ easy
Answer: Track keypoints frame-to-frame with KLT, optical flow, or re-detect+match—balance drift vs redetection.
14 Features in SLAM / VO? 📊 medium
Answer: Sparse landmarks for bundle adjustment; need repeatable detection and robust data association across frames.
15 Define repeatability. ⚡ easy
Answer: Same real-world point detected under noise, blur, and viewpoint change—measured by overlap of keypoint regions on benchmark sequences.
16 Define distinctiveness. ⚡ easy
Answer: Descriptor separates correct matches from distractors—low false match rate at fixed threshold.
17 Effect of occlusion? ⚡ easy
Answer: Keypoints disappear; need robust matching, wide baseline tolerance, or dense methods / learning-based segmentation.
18 Why Hamming distance? ⚡ easy
Answer: For binary descriptors (BRIEF, ORB)—XOR + bit count; fast with POPCNT hardware.
19 What is FLANN? 📊 medium
Answer: Fast Library for Approximate Nearest Neighbors—speeds k-NN on high-dim descriptors with trees or LSH; trade accuracy for speed.
20 Learned local features? 🔥 hard
Answer: SuperPoint, LIFT, etc.—CNNs predict keypoints+descriptors end-to-end; outperform classical on some benchmarks with enough data.

Features Intro Cheat Sheet

Pipeline
  • Detect
  • Describe
  • Match
Invariance
  • Scale / rotation
  • Affine (harder)
Robustness
  • Ratio test
  • RANSAC

💡 Pro tip: Separate “where” (detector) from “what” (descriptor).

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.