Computer Vision Interview 20 essential Q&A Updated 2026
transforms

Image Transformations: 20 Essential Q&A

Geometric image transforms—when you need affine vs perspective, how warping works, and common interview pitfalls.

~11 min read 20 questions Beginner–Intermediate
affinehomographywarpinterpolation
1 What is a geometric image transformation? ⚡ easy
Answer: A mapping that moves pixel locations—translation, rotation, scale, affine, or perspective—while optionally resampling intensities. It changes spatial layout but not the semantic label if the transform is label-consistent (e.g. bbox corners transformed too).
2 Define translation of an image. ⚡ easy
Answer: Shifting all pixels by offsets (tx, ty). Implemented by moving the sampling grid or adjusting the transform matrix with identity + translation column. Boundaries may require padding or cropping.
3 What is isotropic vs anisotropic scaling? ⚡ easy
Answer: Isotropic: same scale sx = sy preserves angles. Anisotropic: sx ≠ sy stretches content—can turn circles into ellipses. Know effect on aspect ratio for detection labels.
4 How is rotation about the origin represented in 2D? 📊 medium
Answer: Linear part is matrix [[cos θ, -sin θ],[sin θ, cos θ]]. In practice pick a rotation center (image center) via translate-rotate-translate composition. Large rotations need bigger canvas or cropping.
5 What does flipping do for ML? ⚡ easy
Answer: Horizontal flip is a common label-preserving augmentation for many object classes; vertical flip may break semantics (people, text, traffic scenes). Always validate against dataset semantics.
6 Homogeneous coordinates for 2D transforms? 📊 medium
Answer: Represent point (x,y) as (x,y,1). Allows affine maps as 3×3 matrices acting on homogeneous vectors, unifying translation with linear maps for composition.
7 What is an affine transformation? 📊 medium
Answer: Maps parallel lines to parallel lines: combination of linear transform and translation—rotation, scale, shear. Preserves ratios along lines but not necessarily lengths or angles unless constrained (similarity/euclidean).
8 How many degrees of freedom does a 2D affine map have? 📊 medium
Answer: Six (4 in the 2×2 linear part + 2 translation). You need 3 point correspondences (non-degenerate) to estimate it in general.
9 How does perspective differ from affine? 🔥 hard
Answer: Projective maps preserve collinearity but not parallelism—parallel world lines can converge in the image (vanishing points). Needed for planes viewed at an angle, document scanning, and bird’s-eye view from ground cameras.
10 What is a homography? 🔥 hard
Answer: A 3×3 projective transform (up to scale) mapping one plane to another in pinhole imaging. Relates two views of the same planar surface. Estimated from 4 point correspondences (DLT) with constraints.
11 Forward vs inverse warping? 📊 medium
Answer: Forward: map source→dest can leave holes and overlaps. Inverse: for each destination pixel, sample source via inverse map—avoids gaps and is standard in OpenCV warp* with a chosen interpolator.
12 Why does warping need interpolation? 📊 medium
Answer: Mapped coordinates land between pixels. Nearest, bilinear, bicubic choose neighborhood weights—trade speed vs aliasing/blur. Downscaling may need prefiltering to avoid aliasing.
import cv2
M = cv2.getRotationMatrix2D((cx, cy), angle, scale)
out = cv2.warpAffine(img, M, (w, h))
13 Crop vs pad after transform? ⚡ easy
Answer: Rotation/scale can push content outside the original canvas—either expand canvas with padding (constant, reflect) or crop to a fixed size. Detection boxes must be clipped or transformed consistently.
14 Augmentation: random affine on segmentation masks? 📊 medium
Answer: Apply the same spatial map to image and mask (nearest-neighbor interpolation for label masks to avoid fractional classes). For instance segmentation, warp polygons or rasterize after transform.
15 What is image registration? 🔥 hard
Answer: Aligning two images of the same scene into a common coordinate frame—via feature matching + homography/affine, optical flow, or optimization. Used in medical imaging, panorama stitching, and super-resolution.
16 What is a similarity transform? 📊 medium
Answer: Rotation + uniform scale + translation (4 DOF in 2D). Preserves angles and ratios of lengths—good model when perspective effects are weak.
17 What is a rigid (Euclidean) transform? ⚡ easy
Answer: Rotation + translation only—preserves distances and angles (3 DOF in 2D). Models camera motion parallel to the plane or object pose without scale change.
18 How do you compose transforms? 📊 medium
Answer: Multiply their homogeneous matrices in application order (rightmost often applied first to a column vector—be consistent with your library convention).
19 OpenCV: warpAffine vs warpPerspective? ⚡ easy
Answer: warpAffine uses a 2×3 affine map; warpPerspective uses full 3×3 homography. Choose based on whether parallelism must be preserved (affine) or full perspective correction is needed.
20 Are lens distortion and homography the same? 📊 medium
Answer: No—radial/tangential distortion is nonlinear and modeled separately (Brown-Conrady) before or jointly with pinhole projection. Undistort first, then apply homography for many planar AR/document pipelines.

Transforms Cheat Sheet

Models
  • Euclidean → Similarity
  • Affine (6 DOF)
  • Projective / H (8 DOF)
Warping
  • Inverse sampling
  • Interpolation choice
  • Same transform for masks
Uses
  • Augmentation
  • Stitching / BEV
  • Undistort + pinhole

💡 Pro tip: State affine vs projective using parallelism and vanishing points.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.