Computer Vision Interview 80 Q&A Chapter 2

Image Processing Pipeline — Interview Q&A

Color spaces, geometric transformations, convolution filtering, and edge detection for classical computer vision preprocessing.

80 questions Chapter 2

Color Spaces: 20 Essential Q&A

1 What is a color space? ⚡ easy
Answer: A coordinate system for representing colors as numeric tuples (e.g. three numbers for trichromatic display). Different spaces emphasize different properties—device RGB for screens, HSV for intuitive hue/saturation edits, LAB for perceptual distance.
2 Describe the RGB additive model. ⚡ easy
Answer: Red, Green, Blue primary lights added for displays. Each channel 0–255 (8-bit) combines to reproduce colors on monitors. It is device-dependent unless tied to a standard like sRGB.
import cv2
hsv = cv2.cvtColor(bgr, cv2.COLOR_BGR2HSV)
3 Why mention BGR separately from RGB? ⚡ easy
Answer: Libraries like OpenCV store channels as B, G, R. Algorithms are identical if consistent, but visualization and pre-trained weights expecting RGB need an explicit swap.
4 What do H, S, V represent? 📊 medium
Answer: Hue (color tint on a wheel), Saturation (colorfulness vs gray), Value/Brightness (intensity). Cylindrical geometry separates chromatic from achromatic changes more intuitively than RGB for some tasks.
5 Interview: when preprocess with HSV? 📊 medium
Answer: Segmenting by hue ranges (e.g. colored objects), thresholding saturation/value to ignore shadows differently than RGB splits, and some augmentations that tweak hue/saturation while preserving identity.
6 Why is LAB used in vision and graphics? 🔥 hard
Answer: L* is lightness; a*, b* are color-opponent dimensions. Euclidean distance in LAB approximates perceptual difference better than RGB. Useful for color transfer, quality metrics, and some clustering tasks.
7 What is YCbCr? 📊 medium
Answer: Separates luma (Y) from chrominance (Cb, Cr). Used in JPEG and video codecs because human vision is more sensitive to brightness than color—enabling chroma subsampling.
8 Where does CMYK appear? ⚡ easy
Answer: Subtractive printing (cyan, magenta, yellow, key/black). Less common in core CV training; relevant for print QA, packaging inspection, and prepress—not for typical RGB camera pipelines.
9 Is grayscale a “color space”? ⚡ easy
Answer: It is a single-channel intensity representation, often derived from RGB via weighted sum. It discards chrominance—good for edge detection and speed when color is irrelevant.
10 What does linear RGB mean vs sRGB? 🔥 hard
Answer: Sensors measure roughly linear light; displays apply gamma encoding (sRGB transfer function) for perceptual uniformity. Some photometric algorithms (deblur, relighting) need linearization via inverse gamma before physical correctness.
11 What is gamma correction? 📊 medium
Answer: Nonlinear mapping between stored values and displayed intensity to match human brightness perception and legacy CRT behavior. Applying gamma wrong can break color statistics and blur/threshold results.
12 What is a color gamut? 📊 medium
Answer: The range of colors a device or space can represent. Wide-gamut displays (P3) vs sRGB differ; out-of-gamut colors clip or map when converting—important for medical imaging and professional color.
13 What is a white point / illuminant? 🔥 hard
Answer: Reference neutral light (e.g. D65) for interpreting RGB values. Different cameras/AWB change apparent colors; robust pipelines account for illumination via white balance or learning.
14 What is 4:2:0 chroma subsampling? 📊 medium
Answer: Full luma resolution but quarter resolution for chroma planes—exploits lower acuity for color. Can cause color fringing on sharp edges when decoded; relevant for video compression pipelines.
15 Should you normalize each RGB channel separately? ⚡ easy
Answer: Sometimes for model input (zero mean / unit var per channel). For photometric consistency, consider normalization that preserves color ratios—or work in a space suited to the task (e.g. LAB L channel only).
16 How do augmentations interact with color space? 📊 medium
Answer: Random brightness/contrast often in RGB or HSV; hue jitter in HSV. Extreme hue shifts may leave gamut or break class semantics—keep augmentations label-safe.
17 Why is thresholding harder in RGB than gray? 📊 medium
Answer: RGB thresholding needs rules in 3D (ranges per channel or distance to a color). HSV can separate hue cone from lighting via S/V gating—still not perfect under colored illumination.
18 Compare histogram equalization on RGB vs channels? ⚡ easy
Answer: Applying independently to R,G,B shifts colors (color cast). Often convert to LAB and equalize L only, or use CLAHE on luminance to preserve chroma.
19 Mention one approach to illumination invariance. 🔥 hard
Answer: Retinex-style ideas, white balance, homomorphic filtering (separate illumination/reflectance in log domain), or learning-based methods. Interviews reward naming tradeoffs (artifacts vs compute).
20 Typical order: decode → color convert → resize? 📊 medium
Answer: Often: load image → ensure correct color order → optional WB/gamma fix → resize/crop with good interpolation → normalize to tensor. Order matters: resize after linearization for photometric tasks; many DL pipelines keep it simple in sRGB uint8.

Image Transformations: 20 Essential Q&A

21 What is a geometric image transformation? ⚡ easy
Answer: A mapping that moves pixel locations—translation, rotation, scale, affine, or perspective—while optionally resampling intensities. It changes spatial layout but not the semantic label if the transform is label-consistent (e.g. bbox corners transformed too).
22 Define translation of an image. ⚡ easy
Answer: Shifting all pixels by offsets (tx, ty). Implemented by moving the sampling grid or adjusting the transform matrix with identity + translation column. Boundaries may require padding or cropping.
23 What is isotropic vs anisotropic scaling? ⚡ easy
Answer: Isotropic: same scale sx = sy preserves angles. Anisotropic: sx ≠ sy stretches content—can turn circles into ellipses. Know effect on aspect ratio for detection labels.
24 How is rotation about the origin represented in 2D? 📊 medium
Answer: Linear part is matrix [[cos θ, -sin θ],[sin θ, cos θ]]. In practice pick a rotation center (image center) via translate-rotate-translate composition. Large rotations need bigger canvas or cropping.
25 What does flipping do for ML? ⚡ easy
Answer: Horizontal flip is a common label-preserving augmentation for many object classes; vertical flip may break semantics (people, text, traffic scenes). Always validate against dataset semantics.
26 Homogeneous coordinates for 2D transforms? 📊 medium
Answer: Represent point (x,y) as (x,y,1). Allows affine maps as 3×3 matrices acting on homogeneous vectors, unifying translation with linear maps for composition.
27 What is an affine transformation? 📊 medium
Answer: Maps parallel lines to parallel lines: combination of linear transform and translation—rotation, scale, shear. Preserves ratios along lines but not necessarily lengths or angles unless constrained (similarity/euclidean).
28 How many degrees of freedom does a 2D affine map have? 📊 medium
Answer: Six (4 in the 2×2 linear part + 2 translation). You need 3 point correspondences (non-degenerate) to estimate it in general.
29 How does perspective differ from affine? 🔥 hard
Answer: Projective maps preserve collinearity but not parallelism—parallel world lines can converge in the image (vanishing points). Needed for planes viewed at an angle, document scanning, and bird’s-eye view from ground cameras.
30 What is a homography? 🔥 hard
Answer: A 3×3 projective transform (up to scale) mapping one plane to another in pinhole imaging. Relates two views of the same planar surface. Estimated from 4 point correspondences (DLT) with constraints.
31 Forward vs inverse warping? 📊 medium
Answer: Forward: map source→dest can leave holes and overlaps. Inverse: for each destination pixel, sample source via inverse map—avoids gaps and is standard in OpenCV warp* with a chosen interpolator.
32 Why does warping need interpolation? 📊 medium
Answer: Mapped coordinates land between pixels. Nearest, bilinear, bicubic choose neighborhood weights—trade speed vs aliasing/blur. Downscaling may need prefiltering to avoid aliasing.
import cv2
M = cv2.getRotationMatrix2D((cx, cy), angle, scale)
out = cv2.warpAffine(img, M, (w, h))
33 Crop vs pad after transform? ⚡ easy
Answer: Rotation/scale can push content outside the original canvas—either expand canvas with padding (constant, reflect) or crop to a fixed size. Detection boxes must be clipped or transformed consistently.
34 Augmentation: random affine on segmentation masks? 📊 medium
Answer: Apply the same spatial map to image and mask (nearest-neighbor interpolation for label masks to avoid fractional classes). For instance segmentation, warp polygons or rasterize after transform.
35 What is image registration? 🔥 hard
Answer: Aligning two images of the same scene into a common coordinate frame—via feature matching + homography/affine, optical flow, or optimization. Used in medical imaging, panorama stitching, and super-resolution.
36 What is a similarity transform? 📊 medium
Answer: Rotation + uniform scale + translation (4 DOF in 2D). Preserves angles and ratios of lengths—good model when perspective effects are weak.
37 What is a rigid (Euclidean) transform? ⚡ easy
Answer: Rotation + translation only—preserves distances and angles (3 DOF in 2D). Models camera motion parallel to the plane or object pose without scale change.
38 How do you compose transforms? 📊 medium
Answer: Multiply their homogeneous matrices in application order (rightmost often applied first to a column vector—be consistent with your library convention).
39 OpenCV: warpAffine vs warpPerspective? ⚡ easy
Answer: warpAffine uses a 2×3 affine map; warpPerspective uses full 3×3 homography. Choose based on whether parallelism must be preserved (affine) or full perspective correction is needed.
40 Are lens distortion and homography the same? 📊 medium
Answer: No—radial/tangential distortion is nonlinear and modeled separately (Brown-Conrady) before or jointly with pinhole projection. Undistort first, then apply homography for many planar AR/document pipelines.

Image Filtering: 20 Essential Q&A

41 What is image filtering? ⚡ easy
Answer: Computing a new image where each output pixel is a function of a neighborhood of input pixels. Linear filters use weighted sums (convolution/correlation); nonlinear filters include median, bilateral, morphological ops.
42 Define 2D convolution (discrete). 📊 medium
Answer: Slide a kernel over the image; at each location, sum of elementwise products of kernel and flipped neighborhood (strict convolution). Many libraries implement cross-correlation without flip—know which convention your framework uses.
43 Correlation vs convolution for symmetric kernels? 📊 medium
Answer: For symmetric kernels (Gaussian), results match. For asymmetric kernels (Sobel direction), flip matters for strict signal-processing convolution vs correlation.
44 What is a kernel / mask? ⚡ easy
Answer: Small matrix of weights defining neighborhood contributions. Size (e.g. 3×3, 5×5) sets spatial support; larger kernels increase blur radius and compute cost (~kernel area per pixel).
45 Examples of nonlinear filters? ⚡ easy
Answer: Median (order-statistic, good for salt-and-pepper), bilateral (edge-preserving smoothing), morphology (min/max). They do not obey superposition like convolutions.
46 Why use a Gaussian kernel? 📊 medium
Answer: Smooth low-pass filtering that reduces noise and high frequencies while avoiding sharp ringing like ideal low-pass. Separable implementation makes it fast; σ controls blur strength.
import cv2
blur = cv2.GaussianBlur(img, (5, 5), sigmaX=1.0)
47 What does separable filter mean? 🔥 hard
Answer: A 2D kernel K can equal outer product v vᵀ. Convolving with K is equivalent to 1D conv along rows then columns—cost drops from O(WHk²) to O(2WHk) for k×k support.
48 Mean / box filter properties? ⚡ easy
Answer: Simple average of neighborhood—fast (especially with integral images) but has a sharp frequency nulls profile vs Gaussian; can create blocky artifacts compared to Gaussian blur.
49 When prefer median filtering? 📊 medium
Answer: Impulsive salt-and-pepper noise where mean blur smears outliers. Median preserves edges better than Gaussian for that noise but is costlier and can remove thin structures.
50 Intuition for bilateral filter? 🔥 hard
Answer: Weighted average where weights drop with both spatial distance and intensity difference—smooths flat regions but preserves sharp edges. Used for denoising and tone mapping; slower than Gaussian.
51 What is padding in convolution? 📊 medium
Answer: Extends the image border so output size can match input (same padding) or follow strict convolution (valid). Modes: zero, reflect, replicate, wrap—choice affects edges and CNN behavior.
52 Why do edges look different after filtering? ⚡ easy
Answer: Neighborhoods at borders are incomplete; padding synthesizes missing neighbors. Wrong padding can cause dark/bright fringes—noticeable on small images and CNN feature maps.
53 Basic sharpening idea? 📊 medium
Answer: Emphasize high frequencies by adding a scaled Laplacian-like response or subtracting a blurred version from the original—makes edges pop but can amplify noise.
54 What is unsharp masking? 📊 medium
Answer: Enhancement: original + amount × (original − blurred). The difference is a high-boost of details; used in photography and preprocessing (with care for noise).
55 How is CNN stride related to classical filtering? 📊 medium
Answer: Stride >1 subsamples the output—like convolve-then-downsample. Larger stride increases receptive field progression and reduces spatial size; different from stride-1 spatial filtering used in preprocessing.
56 Frequency view of Gaussian blur? 🔥 hard
Answer: Gaussian in space ↔ Gaussian in frequency; it attenuates high frequencies smoothly. Helps before subsampling to limit aliasing (Nyquist)—ties back to image basics.
57 Gaussian noise vs salt-and-pepper—filter choice? 📊 medium
Answer: Gaussian noise: linear smoothing (Gaussian blur) or Wiener/BM3D-class methods at higher level. Salt-and-pepper: median or morphological openings/closings.
58 How do derivative filters relate to filtering? 📊 medium
Answer: Finite differences (Sobel/Prewitt) are short convolution kernels approximating gradients—high-pass. Often paired with prior Gaussian smoothing to reduce noise before edge detection.
59 Why normalize blur kernels? ⚡ easy
Answer: So the DC gain is 1—preserves average brightness. Unnormalized Gaussian sums to 1 after discretization normalization; forgetting normalization scales image intensity.
60 OpenCV: GaussianBlur vs filter2D? ⚡ easy
Answer: GaussianBlur builds separable Gaussian internally. filter2D applies arbitrary kernel (correlation-style in OpenCV)—flexible for custom linear filters.

Edge Detection: 20 Essential Q&A

61 What is edge detection? ⚡ easy
Answer: Finding boundaries where intensity changes rapidly—object outlines, surface markings, shadows. Edges are local; full segmentation groups pixels into regions.
62 What is the image gradient ∇I? ⚡ easy
Answer: Vector of partial derivatives (Ix, Iy). Magnitude ‖∇I‖ shows edge strength; direction is perpendicular to the edge (along max rate of change).
63 How does the Sobel operator work? 📊 medium
Answer: Discrete 3×3 separable approximation of derivatives with slight smoothing (center weight). Gx and Gy kernels estimate Ix, Iy; combine for magnitude and angle.
64 How does Prewitt differ from Sobel? ⚡ easy
Answer: Similar 3×3 derivative masks; weights differ slightly (Sobel emphasizes center more). Both approximate first derivatives; results are often close for interviews.
65 What does the Laplacian ∇²I detect? 📊 medium
Answer: Second derivative—zero-crossings align with edges. Sensitive to noise; often applied to Gaussian-smoothed image (LoG) for stability.
66 What are zero-crossings of ∇²(G*I)? 📊 medium
Answer: Locations where Laplacian of Gaussian changes sign—candidate edges. Need additional filtering to reduce spurious responses from noise.
67 Why blur with Gaussian before taking derivatives? 📊 medium
Answer: Differentiation amplifies noise. Gaussian low-pass reduces noise while keeping meaningful discontinuities; leads to LoG or smooth gradients for Canny.
68 List the Canny edge detector steps. 🔥 hard
Answer: 1) Gaussian smooth 2) gradient magnitude/direction 3) non-max suppression 4) hysteresis with high/low thresholds to link strong edges and reject weak noise.
69 What is non-maximum suppression (NMS)? 📊 medium
Answer: Thins edges: at each pixel keep magnitude only if it is a local max along the gradient direction—produces one-pixel-wide ridges.
70 What is hysteresis thresholding? 📊 medium
Answer: Use high T_hi to accept strong edges, low T_lo to continue chains from strong pixels—reduces broken edges while suppressing isolated weak noise.
71 Why are “thick” edges undesirable? ⚡ easy
Answer: Thick edges blur object boundaries, hurt subpixel localization, and complicate linking. NMS aims for single-pixel thickness.
72 Detect edges in RGB how? 📊 medium
Answer: Options: max/mean gradient across channels, convert to luminance first, or vector gradient methods. Channel-wise max is simple; luminance can discard chromatic edges.
73 Why is salt-and-pepper noise bad for edges? ⚡ easy
Answer: Creates spurious large gradients. Median filter first can help; Gaussian blur before derivatives is standard for Gaussian noise.
74 How does scale (σ) affect edges? 📊 medium
Answer: Large σ: fewer, smoother edges (coarse structure). Small σ: more detail and noise. Multi-scale edge detection combines responses at several σ.
75 What is the Laplacian of Gaussian (LoG) idea? 🔥 hard
Answer: Smooth with Gaussian, apply Laplacian, find zero-crossings—Marr-Hildreth approach. Approximated by Difference of Gaussians (DoG) in some pipelines.
77 Typical cv2.Canny parameters? ⚡ easy
Answer: threshold1, threshold2 for hysteresis (low/high), apertureSize for Sobel, L2gradient flag for magnitude formula. Tune for your noise and scale.
import cv2
e = cv2.Canny(img, 50, 150)
78 Gradient direction vs edge normal? 📊 medium
Answer: Gradient points in direction of steepest ascent; edge normal is often aligned with gradient; edge tangent is perpendicular.
79 How get sub-pixel edge location? 🔥 hard
Answer: Fit parabola to gradient magnitudes along normal, moment-based refinement, or optimization—used in metrology and calibration.
80 Edges vs segmentation? ⚡ easy
Answer: Edges are local discontinuities; segmentation assigns each pixel to a region/object. Edges can guide segmentation (watershed, active contours, graphs).
Full tutorial chapter

Pair these interview notes with the matching CV tutorial chapter.

align-items-center flex-wrap gap-2"> Previous Next