Image Processing Pipeline — Interview Q&A

Question 1

1 What is a color space? ⚡ easy

Answer

Answer: A coordinate system for representing colors as numeric tuples (e.g. three numbers for trichromatic display). Different spaces emphasize different properties—device RGB for screens, HSV for intuitive hue/saturation edits, LAB for perceptual distance.

Question 2

2 Describe the RGB additive model. ⚡ easy

Answer

Answer: Red, Green, Blue primary lights added for displays. Each channel 0–255 (8-bit) combines to reproduce colors on monitors. It is device-dependent unless tied to a standard like sRGB.

Question 3

3 Why mention BGR separately from RGB? ⚡ easy

Answer

Answer: Libraries like OpenCV store channels as B, G, R. Algorithms are identical if consistent, but visualization and pre-trained weights expecting RGB need an explicit swap.

Question 4

4 What do H, S, V represent? 📊 medium

Answer

Answer: Hue (color tint on a wheel), Saturation (colorfulness vs gray), Value/Brightness (intensity). Cylindrical geometry separates chromatic from achromatic changes more intuitively than RGB for some tasks.

Question 5

5 Interview: when preprocess with HSV? 📊 medium

Answer

Answer: Segmenting by hue ranges (e.g. colored objects), thresholding saturation/value to ignore shadows differently than RGB splits, and some augmentations that tweak hue/saturation while preserving identity.

Question 6

6 Why is LAB used in vision and graphics? 🔥 hard

Answer

Answer: L* is lightness; a*, b* are color-opponent dimensions. Euclidean distance in LAB approximates perceptual difference better than RGB. Useful for color transfer, quality metrics, and some clustering tasks.

Question 7

7 What is YCbCr? 📊 medium

Answer

Answer: Separates luma (Y) from chrominance (Cb, Cr). Used in JPEG and video codecs because human vision is more sensitive to brightness than color—enabling chroma subsampling.

Question 8

8 Where does CMYK appear? ⚡ easy

Answer

Answer: Subtractive printing (cyan, magenta, yellow, key/black). Less common in core CV training; relevant for print QA, packaging inspection, and prepress—not for typical RGB camera pipelines.

Question 9

9 Is grayscale a “color space”? ⚡ easy

Answer

Answer: It is a single-channel intensity representation, often derived from RGB via weighted sum. It discards chrominance—good for edge detection and speed when color is irrelevant.

Question 10

10 What does linear RGB mean vs sRGB? 🔥 hard

Answer

Answer: Sensors measure roughly linear light; displays apply gamma encoding (sRGB transfer function) for perceptual uniformity. Some photometric algorithms (deblur, relighting) need linearization via inverse gamma before physical correctness.

Question 11

11 What is gamma correction? 📊 medium

Answer

Answer: Nonlinear mapping between stored values and displayed intensity to match human brightness perception and legacy CRT behavior. Applying gamma wrong can break color statistics and blur/threshold results.

Question 12

12 What is a color gamut? 📊 medium

Answer

Answer: The range of colors a device or space can represent. Wide-gamut displays (P3) vs sRGB differ; out-of-gamut colors clip or map when converting—important for medical imaging and professional color.

Question 13

13 What is a white point / illuminant? 🔥 hard

Answer

Answer: Reference neutral light (e.g. D65) for interpreting RGB values. Different cameras/AWB change apparent colors; robust pipelines account for illumination via white balance or learning.

Question 14

14 What is 4:2:0 chroma subsampling? 📊 medium

Answer

Answer: Full luma resolution but quarter resolution for chroma planes—exploits lower acuity for color. Can cause color fringing on sharp edges when decoded; relevant for video compression pipelines.

Question 15

15 Should you normalize each RGB channel separately? ⚡ easy

Answer

Answer: Sometimes for model input (zero mean / unit var per channel). For photometric consistency, consider normalization that preserves color ratios—or work in a space suited to the task (e.g. LAB L channel only).

Question 16

16 How do augmentations interact with color space? 📊 medium

Answer

Answer: Random brightness/contrast often in RGB or HSV; hue jitter in HSV. Extreme hue shifts may leave gamut or break class semantics—keep augmentations label-safe.

Question 17

17 Why is thresholding harder in RGB than gray? 📊 medium

Answer

Answer: RGB thresholding needs rules in 3D (ranges per channel or distance to a color). HSV can separate hue cone from lighting via S/V gating—still not perfect under colored illumination.

Question 18

18 Compare histogram equalization on RGB vs channels? ⚡ easy

Answer

Answer: Applying independently to R,G,B shifts colors (color cast). Often convert to LAB and equalize L only, or use CLAHE on luminance to preserve chroma.

Question 19

19 Mention one approach to illumination invariance. 🔥 hard

Answer

Answer: Retinex-style ideas, white balance, homomorphic filtering (separate illumination/reflectance in log domain), or learning-based methods. Interviews reward naming tradeoffs (artifacts vs compute).

Question 20

20 Typical order: decode → color convert → resize? 📊 medium

Answer

Answer: Often: load image → ensure correct color order → optional WB/gamma fix → resize/crop with good interpolation → normalize to tensor. Order matters: resize after linearization for photometric tasks; many DL pipelines keep it simple in sRGB uint8.

Question 21

21 What is a geometric image transformation? ⚡ easy

Answer

Answer: A mapping that moves pixel locations—translation, rotation, scale, affine, or perspective—while optionally resampling intensities. It changes spatial layout but not the semantic label if the transform is label-consistent (e.g. bbox corners transformed too).

Question 22

22 Define translation of an image. ⚡ easy

Answer

Answer: Shifting all pixels by offsets (tx, ty). Implemented by moving the sampling grid or adjusting the transform matrix with identity + translation column. Boundaries may require padding or cropping.

Question 23

23 What is isotropic vs anisotropic scaling? ⚡ easy

Answer

Answer: Isotropic: same scale sx = sy preserves angles. Anisotropic: sx ≠ sy stretches content—can turn circles into ellipses. Know effect on aspect ratio for detection labels.

Question 24

24 How is rotation about the origin represented in 2D? 📊 medium

Answer

Answer: Linear part is matrix [[cos θ, -sin θ],[sin θ, cos θ]]. In practice pick a rotation center (image center) via translate-rotate-translate composition. Large rotations need bigger canvas or cropping.

Question 25

25 What does flipping do for ML? ⚡ easy

Answer

Answer: Horizontal flip is a common label-preserving augmentation for many object classes; vertical flip may break semantics (people, text, traffic scenes). Always validate against dataset semantics.

Question 26

26 Homogeneous coordinates for 2D transforms? 📊 medium

Answer

Answer: Represent point (x,y) as (x,y,1). Allows affine maps as 3×3 matrices acting on homogeneous vectors, unifying translation with linear maps for composition.

Question 27

27 What is an affine transformation? 📊 medium

Answer

Answer: Maps parallel lines to parallel lines: combination of linear transform and translation—rotation, scale, shear. Preserves ratios along lines but not necessarily lengths or angles unless constrained (similarity/euclidean).

Question 28

28 How many degrees of freedom does a 2D affine map have? 📊 medium

Answer

Answer: Six (4 in the 2×2 linear part + 2 translation). You need 3 point correspondences (non-degenerate) to estimate it in general.

Question 29

29 How does perspective differ from affine? 🔥 hard

Answer

Answer: Projective maps preserve collinearity but not parallelism—parallel world lines can converge in the image (vanishing points). Needed for planes viewed at an angle, document scanning, and bird’s-eye view from ground cameras.

Question 30

30 What is a homography? 🔥 hard

Answer

Answer: A 3×3 projective transform (up to scale) mapping one plane to another in pinhole imaging. Relates two views of the same planar surface. Estimated from 4 point correspondences (DLT) with constraints.

Question 31

31 Forward vs inverse warping? 📊 medium

Answer

Answer: Forward: map source→dest can leave holes and overlaps. Inverse: for each destination pixel, sample source via inverse map—avoids gaps and is standard in OpenCV warp* with a chosen interpolator.

Question 32

32 Why does warping need interpolation? 📊 medium

Answer

Answer: Mapped coordinates land between pixels. Nearest, bilinear, bicubic choose neighborhood weights—trade speed vs aliasing/blur. Downscaling may need prefiltering to avoid aliasing.

Question 33

33 Crop vs pad after transform? ⚡ easy

Answer

Answer: Rotation/scale can push content outside the original canvas—either expand canvas with padding (constant, reflect) or crop to a fixed size. Detection boxes must be clipped or transformed consistently.

Question 34

34 Augmentation: random affine on segmentation masks? 📊 medium

Answer

Answer: Apply the same spatial map to image and mask (nearest-neighbor interpolation for label masks to avoid fractional classes). For instance segmentation, warp polygons or rasterize after transform.

Question 35

35 What is image registration? 🔥 hard

Answer

Answer: Aligning two images of the same scene into a common coordinate frame—via feature matching + homography/affine, optical flow, or optimization. Used in medical imaging, panorama stitching, and super-resolution.

Question 36

36 What is a similarity transform? 📊 medium

Answer

Answer: Rotation + uniform scale + translation (4 DOF in 2D). Preserves angles and ratios of lengths—good model when perspective effects are weak.

Question 37

37 What is a rigid (Euclidean) transform? ⚡ easy

Answer

Answer: Rotation + translation only—preserves distances and angles (3 DOF in 2D). Models camera motion parallel to the plane or object pose without scale change.

Question 38

38 How do you compose transforms? 📊 medium

Answer

Answer: Multiply their homogeneous matrices in application order (rightmost often applied first to a column vector—be consistent with your library convention).

Question 39

39 OpenCV: warpAffine vs warpPerspective? ⚡ easy

Answer

Answer: warpAffine uses a 2×3 affine map; warpPerspective uses full 3×3 homography. Choose based on whether parallelism must be preserved (affine) or full perspective correction is needed.

Question 40

40 Are lens distortion and homography the same? 📊 medium

Answer

Answer: No—radial/tangential distortion is nonlinear and modeled separately (Brown-Conrady) before or jointly with pinhole projection. Undistort first, then apply homography for many planar AR/document pipelines.

Question 41

41 What is image filtering? ⚡ easy

Answer

Answer: Computing a new image where each output pixel is a function of a neighborhood of input pixels. Linear filters use weighted sums (convolution/correlation); nonlinear filters include median, bilateral, morphological ops.

Question 42

42 Define 2D convolution (discrete). 📊 medium

Answer

Answer: Slide a kernel over the image; at each location, sum of elementwise products of kernel and flipped neighborhood (strict convolution). Many libraries implement cross-correlation without flip—know which convention your framework uses.

Question 43

43 Correlation vs convolution for symmetric kernels? 📊 medium

Answer

Answer: For symmetric kernels (Gaussian), results match. For asymmetric kernels (Sobel direction), flip matters for strict signal-processing convolution vs correlation.

Question 44

44 What is a kernel / mask? ⚡ easy

Answer

Answer: Small matrix of weights defining neighborhood contributions. Size (e.g. 3×3, 5×5) sets spatial support; larger kernels increase blur radius and compute cost (~kernel area per pixel).

Question 45

45 Examples of nonlinear filters? ⚡ easy

Answer

Answer: Median (order-statistic, good for salt-and-pepper), bilateral (edge-preserving smoothing), morphology (min/max). They do not obey superposition like convolutions.

Question 46

46 Why use a Gaussian kernel? 📊 medium

Answer

Answer: Smooth low-pass filtering that reduces noise and high frequencies while avoiding sharp ringing like ideal low-pass. Separable implementation makes it fast; σ controls blur strength.

Question 47

47 What does separable filter mean? 🔥 hard

Answer

Answer: A 2D kernel K can equal outer product v vᵀ. Convolving with K is equivalent to 1D conv along rows then columns—cost drops from O(WHk²) to O(2WHk) for k×k support.

Question 48

48 Mean / box filter properties? ⚡ easy

Answer

Answer: Simple average of neighborhood—fast (especially with integral images) but has a sharp frequency nulls profile vs Gaussian; can create blocky artifacts compared to Gaussian blur.

Question 49

49 When prefer median filtering? 📊 medium

Answer

Answer: Impulsive salt-and-pepper noise where mean blur smears outliers. Median preserves edges better than Gaussian for that noise but is costlier and can remove thin structures.

Question 50

50 Intuition for bilateral filter? 🔥 hard

Answer

Answer: Weighted average where weights drop with both spatial distance and intensity difference—smooths flat regions but preserves sharp edges. Used for denoising and tone mapping; slower than Gaussian.

Question 51

51 What is padding in convolution? 📊 medium

Answer

Answer: Extends the image border so output size can match input (same padding) or follow strict convolution (valid). Modes: zero, reflect, replicate, wrap—choice affects edges and CNN behavior.

Question 52

52 Why do edges look different after filtering? ⚡ easy

Answer

Answer: Neighborhoods at borders are incomplete; padding synthesizes missing neighbors. Wrong padding can cause dark/bright fringes—noticeable on small images and CNN feature maps.

Question 53

53 Basic sharpening idea? 📊 medium

Answer

Answer: Emphasize high frequencies by adding a scaled Laplacian-like response or subtracting a blurred version from the original—makes edges pop but can amplify noise.

Question 54

54 What is unsharp masking? 📊 medium

Answer

Answer: Enhancement: original + amount × (original − blurred). The difference is a high-boost of details; used in photography and preprocessing (with care for noise).

Question 55

55 How is CNN stride related to classical filtering? 📊 medium

Answer

Answer: Stride >1 subsamples the output—like convolve-then-downsample. Larger stride increases receptive field progression and reduces spatial size; different from stride-1 spatial filtering used in preprocessing.

Question 56

56 Frequency view of Gaussian blur? 🔥 hard

Answer

Answer: Gaussian in space ↔ Gaussian in frequency; it attenuates high frequencies smoothly. Helps before subsampling to limit aliasing (Nyquist)—ties back to image basics.

Question 57

57 Gaussian noise vs salt-and-pepper—filter choice? 📊 medium

Answer

Answer: Gaussian noise: linear smoothing (Gaussian blur) or Wiener/BM3D-class methods at higher level. Salt-and-pepper: median or morphological openings/closings.

Question 58

58 How do derivative filters relate to filtering? 📊 medium

Answer

Answer: Finite differences (Sobel/Prewitt) are short convolution kernels approximating gradients—high-pass. Often paired with prior Gaussian smoothing to reduce noise before edge detection.

Question 59

59 Why normalize blur kernels? ⚡ easy

Answer

Answer: So the DC gain is 1—preserves average brightness. Unnormalized Gaussian sums to 1 after discretization normalization; forgetting normalization scales image intensity.

Question 60

60 OpenCV: GaussianBlur vs filter2D? ⚡ easy

Answer

Answer: GaussianBlur builds separable Gaussian internally. filter2D applies arbitrary kernel (correlation-style in OpenCV)—flexible for custom linear filters.

Question 61

61 What is edge detection? ⚡ easy

Answer

Answer: Finding boundaries where intensity changes rapidly—object outlines, surface markings, shadows. Edges are local; full segmentation groups pixels into regions.

Question 62

62 What is the image gradient ∇I? ⚡ easy

Answer

Answer: Vector of partial derivatives (Ix, Iy). Magnitude ‖∇I‖ shows edge strength; direction is perpendicular to the edge (along max rate of change).

Question 63

63 How does the Sobel operator work? 📊 medium

Answer

Answer: Discrete 3×3 separable approximation of derivatives with slight smoothing (center weight). Gx and Gy kernels estimate Ix, Iy; combine for magnitude and angle.

Question 64

64 How does Prewitt differ from Sobel? ⚡ easy

Answer

Answer: Similar 3×3 derivative masks; weights differ slightly (Sobel emphasizes center more). Both approximate first derivatives; results are often close for interviews.

Answer 65

Answer: Second derivative—zero-crossings align with edges. Sensitive to noise; often applied to Gaussian-smoothed image (LoG) for stability.

Answer 66

Answer: Locations where Laplacian of Gaussian changes sign—candidate edges. Need additional filtering to reduce spurious responses from noise.

Answer 67

Answer: Differentiation amplifies noise. Gaussian low-pass reduces noise while keeping meaningful discontinuities; leads to LoG or smooth gradients for Canny.

Answer 68

Answer: 1) Gaussian smooth 2) gradient magnitude/direction 3) non-max suppression 4) hysteresis with high/low thresholds to link strong edges and reject weak noise.

Answer 69

Answer: Thins edges: at each pixel keep magnitude only if it is a local max along the gradient direction—produces one-pixel-wide ridges.

Answer 70

Answer: Use high T_hi to accept strong edges, low T_lo to continue chains from strong pixels—reduces broken edges while suppressing isolated weak noise.

Answer 71

Answer: Thick edges blur object boundaries, hurt subpixel localization, and complicate linking. NMS aims for single-pixel thickness.

Answer 72

Answer: Options: max/mean gradient across channels, convert to luminance first, or vector gradient methods. Channel-wise max is simple; luminance can discard chromatic edges.

Answer 73

Answer: Creates spurious large gradients. Median filter first can help; Gaussian blur before derivatives is standard for Gaussian noise.

Answer 74

Answer: Large σ: fewer, smoother edges (coarse structure). Small σ: more detail and noise. Multi-scale edge detection combines responses at several σ.

Answer 75

Answer: Smooth with Gaussian, apply Laplacian, find zero-crossings—Marr-Hildreth approach. Approximated by Difference of Gaussians (DoG) in some pipelines.

Answer 76

Answer: Connecting broken edge pixels into curves using proximity, direction continuity, or global optimization—step after local edge detection for contours.

Answer 77

Answer: threshold1, threshold2 for hysteresis (low/high), apertureSize for Sobel, L2gradient flag for magnitude formula. Tune for your noise and scale.

Answer 78

Answer: Gradient points in direction of steepest ascent; edge normal is often aligned with gradient; edge tangent is perpendicular.

Answer 79

Answer: Fit parabola to gradient magnitudes along normal, moment-based refinement, or optimization—used in metrology and calibration.

Answer 80

Answer: Edges are local discontinuities; segmentation assigns each pixel to a region/object. Edges can guide segmentation (watershed, active contours, graphs).

Image Processing Pipeline — Interview Q&A

Color Spaces: 20 Essential Q&A

Image Transformations: 20 Essential Q&A

Image Filtering: 20 Essential Q&A

Edge Detection: 20 Essential Q&A

Full tutorial chapter