Computer Vision Interview
80 Q&A
Chapter 2
Image Processing Pipeline — Interview Q&A
Color spaces, geometric transformations, convolution filtering, and edge detection for classical computer vision preprocessing.
80 questions
Chapter 2
Color Spaces: 20 Essential Q&A
1
What is a color space?
⚡ easy
Answer: A coordinate system for representing colors as numeric tuples (e.g. three numbers for trichromatic display). Different spaces emphasize different properties—device RGB for screens, HSV for intuitive hue/saturation edits, LAB for perceptual distance.
2
Describe the RGB additive model.
⚡ easy
Answer: Red, Green, Blue primary lights added for displays. Each channel 0–255 (8-bit) combines to reproduce colors on monitors. It is device-dependent unless tied to a standard like sRGB.
import cv2
hsv = cv2.cvtColor(bgr, cv2.COLOR_BGR2HSV)
3
Why mention BGR separately from RGB?
⚡ easy
Answer: Libraries like OpenCV store channels as B, G, R. Algorithms are identical if consistent, but visualization and pre-trained weights expecting RGB need an explicit swap.
4
What do H, S, V represent?
📊 medium
Answer: Hue (color tint on a wheel), Saturation (colorfulness vs gray), Value/Brightness (intensity). Cylindrical geometry separates chromatic from achromatic changes more intuitively than RGB for some tasks.
5
Interview: when preprocess with HSV?
📊 medium
Answer: Segmenting by hue ranges (e.g. colored objects), thresholding saturation/value to ignore shadows differently than RGB splits, and some augmentations that tweak hue/saturation while preserving identity.
6
Why is LAB used in vision and graphics?
🔥 hard
Answer: L* is lightness; a*, b* are color-opponent dimensions. Euclidean distance in LAB approximates perceptual difference better than RGB. Useful for color transfer, quality metrics, and some clustering tasks.
7
What is YCbCr?
📊 medium
Answer: Separates luma (Y) from chrominance (Cb, Cr). Used in JPEG and video codecs because human vision is more sensitive to brightness than color—enabling chroma subsampling.
8
Where does CMYK appear?
⚡ easy
Answer: Subtractive printing (cyan, magenta, yellow, key/black). Less common in core CV training; relevant for print QA, packaging inspection, and prepress—not for typical RGB camera pipelines.
9
Is grayscale a “color space”?
⚡ easy
Answer: It is a single-channel intensity representation, often derived from RGB via weighted sum. It discards chrominance—good for edge detection and speed when color is irrelevant.
10
What does linear RGB mean vs sRGB?
🔥 hard
Answer: Sensors measure roughly linear light; displays apply gamma encoding (sRGB transfer function) for perceptual uniformity. Some photometric algorithms (deblur, relighting) need linearization via inverse gamma before physical correctness.
11
What is gamma correction?
📊 medium
Answer: Nonlinear mapping between stored values and displayed intensity to match human brightness perception and legacy CRT behavior. Applying gamma wrong can break color statistics and blur/threshold results.
12
What is a color gamut?
📊 medium
Answer: The range of colors a device or space can represent. Wide-gamut displays (P3) vs sRGB differ; out-of-gamut colors clip or map when converting—important for medical imaging and professional color.
13
What is a white point / illuminant?
🔥 hard
Answer: Reference neutral light (e.g. D65) for interpreting RGB values. Different cameras/AWB change apparent colors; robust pipelines account for illumination via white balance or learning.
14
What is 4:2:0 chroma subsampling?
📊 medium
Answer: Full luma resolution but quarter resolution for chroma planes—exploits lower acuity for color. Can cause color fringing on sharp edges when decoded; relevant for video compression pipelines.
15
Should you normalize each RGB channel separately?
⚡ easy
Answer: Sometimes for model input (zero mean / unit var per channel). For photometric consistency, consider normalization that preserves color ratios—or work in a space suited to the task (e.g. LAB L channel only).
16
How do augmentations interact with color space?
📊 medium
Answer: Random brightness/contrast often in RGB or HSV; hue jitter in HSV. Extreme hue shifts may leave gamut or break class semantics—keep augmentations label-safe.
17
Why is thresholding harder in RGB than gray?
📊 medium
Answer: RGB thresholding needs rules in 3D (ranges per channel or distance to a color). HSV can separate hue cone from lighting via S/V gating—still not perfect under colored illumination.
18
Compare histogram equalization on RGB vs channels?
⚡ easy
Answer: Applying independently to R,G,B shifts colors (color cast). Often convert to LAB and equalize L only, or use CLAHE on luminance to preserve chroma.
19
Mention one approach to illumination invariance.
🔥 hard
Answer: Retinex-style ideas, white balance, homomorphic filtering (separate illumination/reflectance in log domain), or learning-based methods. Interviews reward naming tradeoffs (artifacts vs compute).
20
Typical order: decode → color convert → resize?
📊 medium
Answer: Often: load image → ensure correct color order → optional WB/gamma fix → resize/crop with good interpolation → normalize to tensor. Order matters: resize after linearization for photometric tasks; many DL pipelines keep it simple in sRGB uint8.
Image Transformations: 20 Essential Q&A
21
What is a geometric image transformation?
⚡ easy
Answer: A mapping that moves pixel locations—translation, rotation, scale, affine, or perspective—while optionally resampling intensities. It changes spatial layout but not the semantic label if the transform is label-consistent (e.g. bbox corners transformed too).
22
Define translation of an image.
⚡ easy
Answer: Shifting all pixels by offsets (tx, ty). Implemented by moving the sampling grid or adjusting the transform matrix with identity + translation column. Boundaries may require padding or cropping.
23
What is isotropic vs anisotropic scaling?
⚡ easy
Answer: Isotropic: same scale sx = sy preserves angles. Anisotropic: sx ≠ sy stretches content—can turn circles into ellipses. Know effect on aspect ratio for detection labels.
24
How is rotation about the origin represented in 2D?
📊 medium
Answer: Linear part is matrix [[cos θ, -sin θ],[sin θ, cos θ]]. In practice pick a rotation center (image center) via translate-rotate-translate composition. Large rotations need bigger canvas or cropping.
25
What does flipping do for ML?
⚡ easy
Answer: Horizontal flip is a common label-preserving augmentation for many object classes; vertical flip may break semantics (people, text, traffic scenes). Always validate against dataset semantics.
26
Homogeneous coordinates for 2D transforms?
📊 medium
Answer: Represent point (x,y) as (x,y,1). Allows affine maps as 3×3 matrices acting on homogeneous vectors, unifying translation with linear maps for composition.
27
What is an affine transformation?
📊 medium
Answer: Maps parallel lines to parallel lines: combination of linear transform and translation—rotation, scale, shear. Preserves ratios along lines but not necessarily lengths or angles unless constrained (similarity/euclidean).
28
How many degrees of freedom does a 2D affine map have?
📊 medium
Answer: Six (4 in the 2×2 linear part + 2 translation). You need 3 point correspondences (non-degenerate) to estimate it in general.
29
How does perspective differ from affine?
🔥 hard
Answer: Projective maps preserve collinearity but not parallelism—parallel world lines can converge in the image (vanishing points). Needed for planes viewed at an angle, document scanning, and bird’s-eye view from ground cameras.
30
What is a homography?
🔥 hard
Answer: A 3×3 projective transform (up to scale) mapping one plane to another in pinhole imaging. Relates two views of the same planar surface. Estimated from 4 point correspondences (DLT) with constraints.
31
Forward vs inverse warping?
📊 medium
Answer: Forward: map source→dest can leave holes and overlaps. Inverse: for each destination pixel, sample source via inverse map—avoids gaps and is standard in OpenCV
warp* with a chosen interpolator.
32
Why does warping need interpolation?
📊 medium
Answer: Mapped coordinates land between pixels. Nearest, bilinear, bicubic choose neighborhood weights—trade speed vs aliasing/blur. Downscaling may need prefiltering to avoid aliasing.
import cv2
M = cv2.getRotationMatrix2D((cx, cy), angle, scale)
out = cv2.warpAffine(img, M, (w, h))
33
Crop vs pad after transform?
⚡ easy
Answer: Rotation/scale can push content outside the original canvas—either expand canvas with padding (constant, reflect) or crop to a fixed size. Detection boxes must be clipped or transformed consistently.
34
Augmentation: random affine on segmentation masks?
📊 medium
Answer: Apply the same spatial map to image and mask (nearest-neighbor interpolation for label masks to avoid fractional classes). For instance segmentation, warp polygons or rasterize after transform.
35
What is image registration?
🔥 hard
Answer: Aligning two images of the same scene into a common coordinate frame—via feature matching + homography/affine, optical flow, or optimization. Used in medical imaging, panorama stitching, and super-resolution.
36
What is a similarity transform?
📊 medium
Answer: Rotation + uniform scale + translation (4 DOF in 2D). Preserves angles and ratios of lengths—good model when perspective effects are weak.
37
What is a rigid (Euclidean) transform?
⚡ easy
Answer: Rotation + translation only—preserves distances and angles (3 DOF in 2D). Models camera motion parallel to the plane or object pose without scale change.
38
How do you compose transforms?
📊 medium
Answer: Multiply their homogeneous matrices in application order (rightmost often applied first to a column vector—be consistent with your library convention).
39
OpenCV:
warpAffine vs warpPerspective?
⚡ easy
Answer: warpAffine uses a 2×3 affine map; warpPerspective uses full 3×3 homography. Choose based on whether parallelism must be preserved (affine) or full perspective correction is needed.
40
Are lens distortion and homography the same?
📊 medium
Answer: No—radial/tangential distortion is nonlinear and modeled separately (Brown-Conrady) before or jointly with pinhole projection. Undistort first, then apply homography for many planar AR/document pipelines.
Image Filtering: 20 Essential Q&A
41
What is image filtering?
⚡ easy
Answer: Computing a new image where each output pixel is a function of a neighborhood of input pixels. Linear filters use weighted sums (convolution/correlation); nonlinear filters include median, bilateral, morphological ops.
42
Define 2D convolution (discrete).
📊 medium
Answer: Slide a kernel over the image; at each location, sum of elementwise products of kernel and flipped neighborhood (strict convolution). Many libraries implement cross-correlation without flip—know which convention your framework uses.
43
Correlation vs convolution for symmetric kernels?
📊 medium
Answer: For symmetric kernels (Gaussian), results match. For asymmetric kernels (Sobel direction), flip matters for strict signal-processing convolution vs correlation.
44
What is a kernel / mask?
⚡ easy
Answer: Small matrix of weights defining neighborhood contributions. Size (e.g. 3×3, 5×5) sets spatial support; larger kernels increase blur radius and compute cost (~kernel area per pixel).
45
Examples of nonlinear filters?
⚡ easy
Answer: Median (order-statistic, good for salt-and-pepper), bilateral (edge-preserving smoothing), morphology (min/max). They do not obey superposition like convolutions.
46
Why use a Gaussian kernel?
📊 medium
Answer: Smooth low-pass filtering that reduces noise and high frequencies while avoiding sharp ringing like ideal low-pass. Separable implementation makes it fast; σ controls blur strength.
import cv2
blur = cv2.GaussianBlur(img, (5, 5), sigmaX=1.0)
47
What does separable filter mean?
🔥 hard
Answer: A 2D kernel K can equal outer product v vᵀ. Convolving with K is equivalent to 1D conv along rows then columns—cost drops from O(WHk²) to O(2WHk) for k×k support.
48
Mean / box filter properties?
⚡ easy
Answer: Simple average of neighborhood—fast (especially with integral images) but has a sharp frequency nulls profile vs Gaussian; can create blocky artifacts compared to Gaussian blur.
49
When prefer median filtering?
📊 medium
Answer: Impulsive salt-and-pepper noise where mean blur smears outliers. Median preserves edges better than Gaussian for that noise but is costlier and can remove thin structures.
50
Intuition for bilateral filter?
🔥 hard
Answer: Weighted average where weights drop with both spatial distance and intensity difference—smooths flat regions but preserves sharp edges. Used for denoising and tone mapping; slower than Gaussian.
51
What is padding in convolution?
📊 medium
Answer: Extends the image border so output size can match input (same padding) or follow strict convolution (valid). Modes: zero, reflect, replicate, wrap—choice affects edges and CNN behavior.
52
Why do edges look different after filtering?
⚡ easy
Answer: Neighborhoods at borders are incomplete; padding synthesizes missing neighbors. Wrong padding can cause dark/bright fringes—noticeable on small images and CNN feature maps.
53
Basic sharpening idea?
📊 medium
Answer: Emphasize high frequencies by adding a scaled Laplacian-like response or subtracting a blurred version from the original—makes edges pop but can amplify noise.
54
What is unsharp masking?
📊 medium
Answer: Enhancement: original + amount × (original − blurred). The difference is a high-boost of details; used in photography and preprocessing (with care for noise).
55
How is CNN stride related to classical filtering?
📊 medium
Answer: Stride >1 subsamples the output—like convolve-then-downsample. Larger stride increases receptive field progression and reduces spatial size; different from stride-1 spatial filtering used in preprocessing.
56
Frequency view of Gaussian blur?
🔥 hard
Answer: Gaussian in space ↔ Gaussian in frequency; it attenuates high frequencies smoothly. Helps before subsampling to limit aliasing (Nyquist)—ties back to image basics.
57
Gaussian noise vs salt-and-pepper—filter choice?
📊 medium
Answer: Gaussian noise: linear smoothing (Gaussian blur) or Wiener/BM3D-class methods at higher level. Salt-and-pepper: median or morphological openings/closings.
58
How do derivative filters relate to filtering?
📊 medium
Answer: Finite differences (Sobel/Prewitt) are short convolution kernels approximating gradients—high-pass. Often paired with prior Gaussian smoothing to reduce noise before edge detection.
59
Why normalize blur kernels?
⚡ easy
Answer: So the DC gain is 1—preserves average brightness. Unnormalized Gaussian sums to 1 after discretization normalization; forgetting normalization scales image intensity.
60
OpenCV:
GaussianBlur vs filter2D?
⚡ easy
Answer: GaussianBlur builds separable Gaussian internally. filter2D applies arbitrary kernel (correlation-style in OpenCV)—flexible for custom linear filters.
Edge Detection: 20 Essential Q&A
61
What is edge detection?
⚡ easy
Answer: Finding boundaries where intensity changes rapidly—object outlines, surface markings, shadows. Edges are local; full segmentation groups pixels into regions.
62
What is the image gradient ∇I?
⚡ easy
Answer: Vector of partial derivatives (Ix, Iy). Magnitude ‖∇I‖ shows edge strength; direction is perpendicular to the edge (along max rate of change).
63
How does the Sobel operator work?
📊 medium
Answer: Discrete 3×3 separable approximation of derivatives with slight smoothing (center weight). Gx and Gy kernels estimate Ix, Iy; combine for magnitude and angle.
64
How does Prewitt differ from Sobel?
⚡ easy
Answer: Similar 3×3 derivative masks; weights differ slightly (Sobel emphasizes center more). Both approximate first derivatives; results are often close for interviews.
65
What does the Laplacian ∇²I detect?
📊 medium
Answer: Second derivative—zero-crossings align with edges. Sensitive to noise; often applied to Gaussian-smoothed image (LoG) for stability.
66
What are zero-crossings of ∇²(G*I)?
📊 medium
Answer: Locations where Laplacian of Gaussian changes sign—candidate edges. Need additional filtering to reduce spurious responses from noise.
67
Why blur with Gaussian before taking derivatives?
📊 medium
Answer: Differentiation amplifies noise. Gaussian low-pass reduces noise while keeping meaningful discontinuities; leads to LoG or smooth gradients for Canny.
68
List the Canny edge detector steps.
🔥 hard
Answer: 1) Gaussian smooth 2) gradient magnitude/direction 3) non-max suppression 4) hysteresis with high/low thresholds to link strong edges and reject weak noise.
69
What is non-maximum suppression (NMS)?
📊 medium
Answer: Thins edges: at each pixel keep magnitude only if it is a local max along the gradient direction—produces one-pixel-wide ridges.
70
What is hysteresis thresholding?
📊 medium
Answer: Use high T_hi to accept strong edges, low T_lo to continue chains from strong pixels—reduces broken edges while suppressing isolated weak noise.
71
Why are “thick” edges undesirable?
⚡ easy
Answer: Thick edges blur object boundaries, hurt subpixel localization, and complicate linking. NMS aims for single-pixel thickness.
72
Detect edges in RGB how?
📊 medium
Answer: Options: max/mean gradient across channels, convert to luminance first, or vector gradient methods. Channel-wise max is simple; luminance can discard chromatic edges.
73
Why is salt-and-pepper noise bad for edges?
⚡ easy
Answer: Creates spurious large gradients. Median filter first can help; Gaussian blur before derivatives is standard for Gaussian noise.
74
How does scale (σ) affect edges?
📊 medium
Answer: Large σ: fewer, smoother edges (coarse structure). Small σ: more detail and noise. Multi-scale edge detection combines responses at several σ.
75
What is the Laplacian of Gaussian (LoG) idea?
🔥 hard
Answer: Smooth with Gaussian, apply Laplacian, find zero-crossings—Marr-Hildreth approach. Approximated by Difference of Gaussians (DoG) in some pipelines.
76
What is edge linking?
📊 medium
Answer: Connecting broken edge pixels into curves using proximity, direction continuity, or global optimization—step after local edge detection for contours.
77
Typical
cv2.Canny parameters?
⚡ easy
Answer: threshold1, threshold2 for hysteresis (low/high), apertureSize for Sobel, L2gradient flag for magnitude formula. Tune for your noise and scale.
import cv2
e = cv2.Canny(img, 50, 150)
78
Gradient direction vs edge normal?
📊 medium
Answer: Gradient points in direction of steepest ascent; edge normal is often aligned with gradient; edge tangent is perpendicular.
79
How get sub-pixel edge location?
🔥 hard
Answer: Fit parabola to gradient magnitudes along normal, moment-based refinement, or optimization—used in metrology and calibration.
80
Edges vs segmentation?
⚡ easy
Answer: Edges are local discontinuities; segmentation assigns each pixel to a region/object. Edges can guide segmentation (watershed, active contours, graphs).
Full tutorial chapter
Pair these interview notes with the matching CV tutorial chapter.