Computer Vision Interview
20 essential Q&A
Updated 2026
transforms
Image Transformations: 20 Essential Q&A
Geometric image transforms—when you need affine vs perspective, how warping works, and common interview pitfalls.
~11 min read
20 questions
Beginner–Intermediate
affinehomographywarpinterpolation
Quick Navigation
1
What is a geometric image transformation?
⚡ easy
Answer: A mapping that moves pixel locations—translation, rotation, scale, affine, or perspective—while optionally resampling intensities. It changes spatial layout but not the semantic label if the transform is label-consistent (e.g. bbox corners transformed too).
2
Define translation of an image.
⚡ easy
Answer: Shifting all pixels by offsets (tx, ty). Implemented by moving the sampling grid or adjusting the transform matrix with identity + translation column. Boundaries may require padding or cropping.
3
What is isotropic vs anisotropic scaling?
⚡ easy
Answer: Isotropic: same scale sx = sy preserves angles. Anisotropic: sx ≠ sy stretches content—can turn circles into ellipses. Know effect on aspect ratio for detection labels.
4
How is rotation about the origin represented in 2D?
📊 medium
Answer: Linear part is matrix [[cos θ, -sin θ],[sin θ, cos θ]]. In practice pick a rotation center (image center) via translate-rotate-translate composition. Large rotations need bigger canvas or cropping.
5
What does flipping do for ML?
⚡ easy
Answer: Horizontal flip is a common label-preserving augmentation for many object classes; vertical flip may break semantics (people, text, traffic scenes). Always validate against dataset semantics.
6
Homogeneous coordinates for 2D transforms?
📊 medium
Answer: Represent point (x,y) as (x,y,1). Allows affine maps as 3×3 matrices acting on homogeneous vectors, unifying translation with linear maps for composition.
7
What is an affine transformation?
📊 medium
Answer: Maps parallel lines to parallel lines: combination of linear transform and translation—rotation, scale, shear. Preserves ratios along lines but not necessarily lengths or angles unless constrained (similarity/euclidean).
8
How many degrees of freedom does a 2D affine map have?
📊 medium
Answer: Six (4 in the 2×2 linear part + 2 translation). You need 3 point correspondences (non-degenerate) to estimate it in general.
9
How does perspective differ from affine?
🔥 hard
Answer: Projective maps preserve collinearity but not parallelism—parallel world lines can converge in the image (vanishing points). Needed for planes viewed at an angle, document scanning, and bird’s-eye view from ground cameras.
10
What is a homography?
🔥 hard
Answer: A 3×3 projective transform (up to scale) mapping one plane to another in pinhole imaging. Relates two views of the same planar surface. Estimated from 4 point correspondences (DLT) with constraints.
11
Forward vs inverse warping?
📊 medium
Answer: Forward: map source→dest can leave holes and overlaps. Inverse: for each destination pixel, sample source via inverse map—avoids gaps and is standard in OpenCV
warp* with a chosen interpolator.
12
Why does warping need interpolation?
📊 medium
Answer: Mapped coordinates land between pixels. Nearest, bilinear, bicubic choose neighborhood weights—trade speed vs aliasing/blur. Downscaling may need prefiltering to avoid aliasing.
import cv2
M = cv2.getRotationMatrix2D((cx, cy), angle, scale)
out = cv2.warpAffine(img, M, (w, h))
13
Crop vs pad after transform?
⚡ easy
Answer: Rotation/scale can push content outside the original canvas—either expand canvas with padding (constant, reflect) or crop to a fixed size. Detection boxes must be clipped or transformed consistently.
14
Augmentation: random affine on segmentation masks?
📊 medium
Answer: Apply the same spatial map to image and mask (nearest-neighbor interpolation for label masks to avoid fractional classes). For instance segmentation, warp polygons or rasterize after transform.
15
What is image registration?
🔥 hard
Answer: Aligning two images of the same scene into a common coordinate frame—via feature matching + homography/affine, optical flow, or optimization. Used in medical imaging, panorama stitching, and super-resolution.
16
What is a similarity transform?
📊 medium
Answer: Rotation + uniform scale + translation (4 DOF in 2D). Preserves angles and ratios of lengths—good model when perspective effects are weak.
17
What is a rigid (Euclidean) transform?
⚡ easy
Answer: Rotation + translation only—preserves distances and angles (3 DOF in 2D). Models camera motion parallel to the plane or object pose without scale change.
18
How do you compose transforms?
📊 medium
Answer: Multiply their homogeneous matrices in application order (rightmost often applied first to a column vector—be consistent with your library convention).
19
OpenCV:
warpAffine vs warpPerspective?
⚡ easy
Answer: warpAffine uses a 2×3 affine map; warpPerspective uses full 3×3 homography. Choose based on whether parallelism must be preserved (affine) or full perspective correction is needed.
20
Are lens distortion and homography the same?
📊 medium
Answer: No—radial/tangential distortion is nonlinear and modeled separately (Brown-Conrady) before or jointly with pinhole projection. Undistort first, then apply homography for many planar AR/document pipelines.
Transforms Cheat Sheet
Models
- Euclidean → Similarity
- Affine (6 DOF)
- Projective / H (8 DOF)
Warping
- Inverse sampling
- Interpolation choice
- Same transform for masks
Uses
- Augmentation
- Stitching / BEV
- Undistort + pinhole
💡 Pro tip: State affine vs projective using parallelism and vanishing points.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.