Computer Vision Interview
20 essential Q&A
Updated 2026
image basics
Image Processing Basics: 20 Essential Q&A
Digital image fundamentals—how pixels, sampling, quantization, and storage show up in interviews.
~10 min read
20 questions
Beginner
pixels resolution channels JPEG/PNG NumPy/OpenCV
Quick Navigation
1. What is a digital image?
2. What is a pixel?
3. Sampling vs quantization
4. Resolution & aspect ratio
5. Image channels
6. Grayscale from RGB
7. Bit depth & dynamic range
2. What is a pixel?
3. Sampling vs quantization
4. Resolution & aspect ratio
5. Image channels
6. Grayscale from RGB
7. Bit depth & dynamic range
1
What is a digital image in computer vision?
⚡ easy
Answer: A 2D (or 2D+channels) grid of samples where each cell is a pixel storing numeric intensity or color. It is a discrete approximation of a continuous scene after capture by a sensor and analog-to-digital conversion.
2
What is a pixel?
⚡ easy
Answer: The smallest addressable element of a raster image. Each pixel holds one or more values (e.g. gray level or R,G,B). Spatially, pixels sit on a regular grid; physically, they correspond to sensor photosites plus processing (demosaicing for color cameras).
3
Explain sampling and quantization.
📊 medium
Answer: Sampling chooses discrete spatial locations (grid resolution). Quantization maps continuous intensity to finite levels (bit depth). Together they convert a continuous image to digital form and introduce spatial and intensity approximation error.
4
What is image resolution?
⚡ easy
Answer: Usually the grid size width × height in pixels (e.g. 1920×1080). Higher resolution preserves finer detail but costs memory and compute. Aspect ratio is width/height; changing resolution without preserving ratio stretches content.
5
What are color channels?
⚡ easy
Answer: Separate 2D arrays (or stacked planes) per color component—commonly R, G, B for display. Grayscale has one channel. Multispectral/hyperspectral images have many bands beyond visible RGB.
6
How is grayscale often computed from RGB?
⚡ easy
Answer: A weighted sum approximating luminance, e.g. 0.299R + 0.587G + 0.114B (ITU-R BT.601) or simpler averages for rough work. Weights reflect human sensitivity to green; the exact formula depends on standard and use case.
7
What is bit depth? Why does it matter?
📊 medium
Answer: Bits per channel (e.g. 8-bit → 256 levels). Higher depth reduces banding and helps medical/raw workflows; 8-bit uint is standard for web and many CV datasets. HDR may use 16/32-bit float linear pipelines before tone mapping.
8
How are pixel coordinates usually indexed?
⚡ easy
Answer: Often (row, col) or (y, x) with origin at top-left, row increasing downward—matching matrix indexing in NumPy/OpenCV. Be careful when converting to math coordinates where y may increase upward.
9
What does tensor shape (H, W, C) mean?
📊 medium
Answer: Height (rows), width (columns), channels—typical for NumPy/OpenCV images. PyTorch often uses (N, C, H, W) for batches. Interviews check you can transpose between layouts without mixing H/W.
10
Raster vs vector graphics?
⚡ easy
Answer: Raster: pixel grid (photos, textures). Vector: curves/paths (SVG, fonts)—infinite resolution until rasterized. CV pipelines usually consume raster tensors; vector assets are rasterized for learning.
11
When choose JPEG vs PNG?
⚡ easy
Answer: JPEG: photos, smaller files, lossy, poor for sharp edges/text. PNG: lossless, transparency, screenshots and graphics. For repeated ML saves, beware JPEG compression artifacts affecting edges and noise.
12
What problems can lossy compression cause for CV?
📊 medium
Answer: Blocking, ringing, color bleeding—especially around edges. Models may overfit artifact patterns. For training data, prefer lossless or high-quality JPEG; for deployment, know your camera/codec pipeline.
13
What is aliasing when downsampling?
📊 medium
Answer: High-frequency detail folds into low frequencies as moiré or jaggies if you shrink without low-pass filtering. Fix: blur then downsample or use good resampling (area interpolation for downscaling in OpenCV).
14
Nearest-neighbor vs bilinear interpolation?
📊 medium
Answer: Nearest: fast, blocky, preserves original values. Bilinear: smooths using 4 neighbors, better for resizing/rotation but blurs fine detail. Bicubic is smoother still; choice affects augmentation and geometric transforms.
15
Typical dtypes for images in NumPy?
⚡ easy
Answer: uint8 [0,255] most common. Float images may be [0,1] or [0,255] depending on library—always normalize consistently before math or neural nets.
import numpy as np
img = np.zeros((480, 640, 3), dtype=np.uint8) # H,W,C
16
Why does OpenCV use BGR?
⚡ easy
Answer: Historical reasons;
imread returns BGR order. Convert to RGB for matplotlib or PIL-centric code: cv2.cvtColor(img, cv2.COLOR_BGR2RGB). Mixing orders is a common interview “debugging” trap.
import cv2
bgr = cv2.imread('x.jpg')
rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
17
What is the alpha channel?
⚡ easy
Answer: Per-pixel opacity for compositing (RGBA). Not always present. When loading to 3-channel models, you often drop alpha or premultiply RGB depending on graphics pipeline.
18
What does an image histogram show?
📊 medium
Answer: The distribution of pixel intensities (per channel or gray). Useful for exposure diagnosis, thresholding intuition, and contrast enhancement—foundation for histogram equalization (covered in later chapters).
19
How does a video relate to images?
⚡ easy
Answer: A sequence of frames (2D images) sampled in time with a frame rate (FPS). Temporal redundancy enables compression and tracking; many CV models treat frames independently at first.
20
What is EXIF metadata?
⚡ easy
Answer: Embedded tags in JPEG/TIFF: orientation, camera settings, timestamp, GPS. The orientation tag can rotate images—some loaders ignore it, causing inconsistent training data; preprocess to canonical orientation.
Image Basics Cheat Sheet
Representation
- Grid of pixels
- Sampling + quantization
- H×W×C / dtypes
Quality
- Resolution & aspect
- Aliasing on resize
- JPEG artifacts
Code pitfalls
- BGR vs RGB
- float range [0,1] vs [0,255]
- (row,col) vs (x,y)
💡 Pro tip: State image shape, dtype, and color order before any algorithm.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.