Computer Vision Interview 20 essential Q&A Updated 2026
image basics

Image Processing Basics: 20 Essential Q&A

Digital image fundamentals—how pixels, sampling, quantization, and storage show up in interviews.

~10 min read 20 questions Beginner
pixels resolution channels JPEG/PNG NumPy/OpenCV
1 What is a digital image in computer vision? ⚡ easy
Answer: A 2D (or 2D+channels) grid of samples where each cell is a pixel storing numeric intensity or color. It is a discrete approximation of a continuous scene after capture by a sensor and analog-to-digital conversion.
2 What is a pixel? ⚡ easy
Answer: The smallest addressable element of a raster image. Each pixel holds one or more values (e.g. gray level or R,G,B). Spatially, pixels sit on a regular grid; physically, they correspond to sensor photosites plus processing (demosaicing for color cameras).
3 Explain sampling and quantization. 📊 medium
Answer: Sampling chooses discrete spatial locations (grid resolution). Quantization maps continuous intensity to finite levels (bit depth). Together they convert a continuous image to digital form and introduce spatial and intensity approximation error.
4 What is image resolution? ⚡ easy
Answer: Usually the grid size width × height in pixels (e.g. 1920×1080). Higher resolution preserves finer detail but costs memory and compute. Aspect ratio is width/height; changing resolution without preserving ratio stretches content.
5 What are color channels? ⚡ easy
Answer: Separate 2D arrays (or stacked planes) per color component—commonly R, G, B for display. Grayscale has one channel. Multispectral/hyperspectral images have many bands beyond visible RGB.
6 How is grayscale often computed from RGB? ⚡ easy
Answer: A weighted sum approximating luminance, e.g. 0.299R + 0.587G + 0.114B (ITU-R BT.601) or simpler averages for rough work. Weights reflect human sensitivity to green; the exact formula depends on standard and use case.
7 What is bit depth? Why does it matter? 📊 medium
Answer: Bits per channel (e.g. 8-bit → 256 levels). Higher depth reduces banding and helps medical/raw workflows; 8-bit uint is standard for web and many CV datasets. HDR may use 16/32-bit float linear pipelines before tone mapping.
8 How are pixel coordinates usually indexed? ⚡ easy
Answer: Often (row, col) or (y, x) with origin at top-left, row increasing downward—matching matrix indexing in NumPy/OpenCV. Be careful when converting to math coordinates where y may increase upward.
9 What does tensor shape (H, W, C) mean? 📊 medium
Answer: Height (rows), width (columns), channels—typical for NumPy/OpenCV images. PyTorch often uses (N, C, H, W) for batches. Interviews check you can transpose between layouts without mixing H/W.
10 Raster vs vector graphics? ⚡ easy
Answer: Raster: pixel grid (photos, textures). Vector: curves/paths (SVG, fonts)—infinite resolution until rasterized. CV pipelines usually consume raster tensors; vector assets are rasterized for learning.
11 When choose JPEG vs PNG? ⚡ easy
Answer: JPEG: photos, smaller files, lossy, poor for sharp edges/text. PNG: lossless, transparency, screenshots and graphics. For repeated ML saves, beware JPEG compression artifacts affecting edges and noise.
12 What problems can lossy compression cause for CV? 📊 medium
Answer: Blocking, ringing, color bleeding—especially around edges. Models may overfit artifact patterns. For training data, prefer lossless or high-quality JPEG; for deployment, know your camera/codec pipeline.
13 What is aliasing when downsampling? 📊 medium
Answer: High-frequency detail folds into low frequencies as moiré or jaggies if you shrink without low-pass filtering. Fix: blur then downsample or use good resampling (area interpolation for downscaling in OpenCV).
14 Nearest-neighbor vs bilinear interpolation? 📊 medium
Answer: Nearest: fast, blocky, preserves original values. Bilinear: smooths using 4 neighbors, better for resizing/rotation but blurs fine detail. Bicubic is smoother still; choice affects augmentation and geometric transforms.
15 Typical dtypes for images in NumPy? ⚡ easy
Answer: uint8 [0,255] most common. Float images may be [0,1] or [0,255] depending on library—always normalize consistently before math or neural nets.
import numpy as np
img = np.zeros((480, 640, 3), dtype=np.uint8)  # H,W,C
16 Why does OpenCV use BGR? ⚡ easy
Answer: Historical reasons; imread returns BGR order. Convert to RGB for matplotlib or PIL-centric code: cv2.cvtColor(img, cv2.COLOR_BGR2RGB). Mixing orders is a common interview “debugging” trap.
import cv2
bgr = cv2.imread('x.jpg')
rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
17 What is the alpha channel? ⚡ easy
Answer: Per-pixel opacity for compositing (RGBA). Not always present. When loading to 3-channel models, you often drop alpha or premultiply RGB depending on graphics pipeline.
18 What does an image histogram show? 📊 medium
Answer: The distribution of pixel intensities (per channel or gray). Useful for exposure diagnosis, thresholding intuition, and contrast enhancement—foundation for histogram equalization (covered in later chapters).
19 How does a video relate to images? ⚡ easy
Answer: A sequence of frames (2D images) sampled in time with a frame rate (FPS). Temporal redundancy enables compression and tracking; many CV models treat frames independently at first.
20 What is EXIF metadata? ⚡ easy
Answer: Embedded tags in JPEG/TIFF: orientation, camera settings, timestamp, GPS. The orientation tag can rotate images—some loaders ignore it, causing inconsistent training data; preprocess to canonical orientation.

Image Basics Cheat Sheet

Representation
  • Grid of pixels
  • Sampling + quantization
  • H×W×C / dtypes
Quality
  • Resolution & aspect
  • Aliasing on resize
  • JPEG artifacts
Code pitfalls
  • BGR vs RGB
  • float range [0,1] vs [0,255]
  • (row,col) vs (x,y)

💡 Pro tip: State image shape, dtype, and color order before any algorithm.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.