Computer Vision Chapter 2

Image Processing Pipeline

Color spaces, geometric transformations, convolution filtering, and edge detection for classical computer vision preprocessing.

Color spaces

Why color spaces matter

A color space defines how we encode color as numbers. RGB (red, green, blue) is natural for displays: we mix light. For algorithms, RGB channels are correlated—changing illumination shifts all three, which makes simple thresholding on “red” unstable. Spaces like HSV separate hue (what color) from brightness, which helps picking a colored object under varying light. LAB is designed to be more perceptually uniform: equal numeric steps are closer to equal visual differences than in RGB.

RGB and OpenCV’s BGR

Each pixel stores three channel values. In theory RGB orders channels R–G–B. OpenCV loads color images as BGR by default, so img[y, x] returns [B, G, R]. Matplotlib and most deep-learning stacks expect RGB—always convert when mixing tools.

import cv2

bgr = cv2.imread("photo.jpg")
if bgr is not None:
    rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
    # rgb ready for matplotlib: plt.imshow(rgb)

Grayscale from color

Grayscale collapses three channels into one luminance value. A common weighted sum (ITU-R BT.601, used in many libraries) is:

Y ≈ 0.299·R + 0.587·G + 0.114·B — green gets the largest weight because human vision is most sensitive to mid-green.

import cv2
import numpy as np

bgr = cv2.imread("photo.jpg")
gray_cv = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)  # same idea as luminance formula

# Manual on a float RGB array in [0,1] (illustrative):
rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
y = 0.299 * rgb[...,0] + 0.587 * rgb[...,1] + 0.114 * rgb[...,2]

OpenCV’s COLOR_BGR2GRAY uses fixed integer coefficients suited for 8-bit images; results match the standard luma idea closely.

HSV: hue, saturation, value

Hue is the color wheel angle (often 0–179 in OpenCV’s 8-bit HSV). Saturation is “how vivid” the color is (gray → 0). Value is brightness. Separating hue from value makes it easier to threshold a specific color range (balls, fruits, signs) under modest lighting changes.

import cv2

bgr = cv2.imread("photo.jpg")
hsv = cv2.cvtColor(bgr, cv2.COLOR_BGR2HSV)

# Example: keep pixels with hue between h0 and h1 (tune per scene)
h0, h1 = 20, 35
mask = (hsv[:, :, 0] >= h0) & (hsv[:, :, 0] <= h1)
# combine with saturation/value thresholds for cleaner masks

In OpenCV, H is 0–179 for 8-bit images (half OpenCV’s 0–360 convention). Always visualize masks; lighting and white balance change optimal ranges.

LAB and YCrCb

LAB (L*a*b*)

L* is lightness; a* and b* are color-opponent channels. Useful for perceptual distance and some segmentation tasks. OpenCV: cv2.COLOR_BGR2LAB.

YCrCb

Separates luma (Y) from chrominance (Cr, Cb). Common in video and JPEG. Skin-color heuristics sometimes use Cr/Cb. OpenCV: cv2.COLOR_BGR2YCrCb.

import cv2

bgr = cv2.imread("photo.jpg")
lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
ycrcb = cv2.cvtColor(bgr, cv2.COLOR_BGR2YCrCb)

Conversion tips

  • Prefer cv2.cvtColor over hand-rolled formulas unless you need custom behavior—rounding and ranges are handled consistently.
  • After conversion, check dtype and value ranges (HSV and LAB ranges differ from 0–255 per channel in OpenCV).
  • For ML, many models expect RGB, NCHW or NHWC normalized floats; convert once in the dataset pipeline.

Takeaways

  • OpenCV color images are BGR; convert for RGB consumers.
  • HSV helps separate hue from brightness for simple color-based masks.
  • LAB and YCrCb separate lightness from chroma for perceptual or video-style processing.

Quick FAQ

For 8-bit HSV, OpenCV stores hue as half degrees (0–179 maps to 0°–358°) so values fit in a byte.

No—choose by task. CNNs often learn in RGB; classical segmentation may benefit from HSV or LAB. Experiment and validate on your data.

Image transformations

What is a geometric transform?

Each output pixel asks “which input coordinate should I sample?” That mapping can be a simple scale and shift, a rotation around a point, or a more general homography (perspective). Implementation samples the source image at (possibly fractional) locations using interpolation—nearest neighbor, bilinear, bicubic—trading speed vs smoothness.

Affine

Parallel lines stay parallel. Translation, rotation, scale, shear. Represented by a 2×3 matrix with warpAffine.

Perspective

Converging lines (e.g. road edges) are allowed. A 3×3 homography maps planes; warpPerspective for document “deskew” or bird’s-eye views.

Resize and scaling

cv2.resize changes width and height. For downscaling, INTER_AREA often reduces aliasing by averaging; for upscaling or small shifts, INTER_LINEAR is a common default. INTER_CUBIC can look smoother but costs more.

import cv2

img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
half = cv2.resize(img, (w // 2, h // 2), interpolation=cv2.INTER_AREA)
double = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_LINEAR)

Translation and rotation

Translation shifts the image by (tx, ty) pixels. Rotation is usually done around a center (often the image center) with optional scale. OpenCV builds a 2×3 affine matrix and applies warpAffine.

import cv2
import numpy as np

img = cv2.imread("photo.jpg")
h, w = img.shape[:2]

# Translation only: [[1,0,tx],[0,1,ty]]
M_t = np.float32([[1, 0, 50], [0, 1, 30]])
shifted = cv2.warpAffine(img, M_t, (w, h), borderMode=cv2.BORDER_CONSTANT, borderValue=(0,0,0))

cx, cy = w / 2, h / 2
angle, scale = 15, 1.0
M_r = cv2.getRotationMatrix2D((cx, cy), angle, scale)
rotated = cv2.warpAffine(img, M_r, (w, h), borderMode=cv2.BORDER_REPLICATE)

BORDER_CONSTANT fills unknown areas with a color; BORDER_REPLICATE extends edge pixels—pick what fits your pipeline.

Flip and crop

cv2.flip mirrors horizontally (1), vertically (0), or both (-1)—cheap data augmentation for classification. Crop is just NumPy slicing img[y0:y1, x0:x1]; combine with resize to standard input sizes for networks.

import cv2

img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
aug_hflip = cv2.flip(img, 1)
patch = img[100:300, 50:250]

Perspective warp (homography)

Given four source corners and four destination corners (e.g. a tilted document → rectangle), getPerspectiveTransform + warpPerspective rectifies the plane. Quality depends on accurate corner detection.

import cv2
import numpy as np

img = cv2.imread("doc.jpg")
h, w = img.shape[:2]
src = np.float32([[100, 200], [400, 180], [420, 600], [80, 620]])
dst = np.float32([[0, 0], [w - 1, 0], [w - 1, h - 1], [0, h - 1]])
M = cv2.getPerspectiveTransform(src, dst)
flat = cv2.warpPerspective(img, M, (w, h))

Practical notes

  • Bounding boxes and keypoints must be transformed with the same geometry as the image—otherwise labels no longer align.
  • Augmentation stacks (rotate + crop + flip) should respect label semantics (e.g. left/right asymmetry in traffic signs).
  • For video, temporal consistency matters: random heavy warps every frame can hurt trackers.

Takeaways

  • Use resize with INTER_AREA when shrinking.
  • getRotationMatrix2D + warpAffine covers most 2D rigid/similar transforms in-plane.
  • warpPerspective for plane-to-plane rectification when perspective is strong.

Quick FAQ

The output canvas is rectangular; corners of the original image may fall outside after rotation. Those areas have no source pixels unless you enlarge the canvas or use a border mode.

Affine preserves parallelism; homography models perspective between two planes (e.g. camera tilt). Use homography when parallel lines in the world appear to meet in the image.

Image filtering

Convolution in one minute

At each location, place the kernel over the image, multiply overlapping values, sum (often normalize)—that sum becomes the new center pixel. Correlation is the same idea without flipping the kernel; in OpenCV’s filter2D the kernel is used as given. Borders need a policy: extend edges, reflect, wrap, or constant padding.

Low-pass (blur)

Weights positive and localized—averages neighbors, suppresses high frequencies (noise, fine texture).

High-pass (sharpen)

Emphasizes differences from neighbors—edges pop; also amplifies noise if overdone.

Box and Gaussian blur

Box blur (cv2.blur) uses equal weights—fast but can look “blocky.” Gaussian blur weights fall off with distance from the center; sigma controls spread. Larger kernels or sigmas mean more smoothing and softer edges.

import cv2

img = cv2.imread("photo.jpg")
box = cv2.blur(img, (5, 5))
gauss = cv2.GaussianBlur(img, (0, 0), sigmaX=1.5)  # ksize (0,0) → from sigma
gauss_k = cv2.GaussianBlur(img, (7, 7), 0)

Gaussian kernels are separable (apply 1D horizontal then vertical)—that is why large Gaussian blurs stay efficient.

Median and edge-aware smoothing

Median blur replaces the center with the neighborhood median—excellent for salt-and-pepper noise while preserving step edges better than linear blur. Kernel size must be odd.

import cv2

img = cv2.imread("noisy.jpg")
med = cv2.medianBlur(img, 5)

# Optional: bilateral filter — smooths flat regions, keeps strong edges (slower)
bilateral = cv2.bilateralFilter(img, d=9, sigmaColor=75, sigmaSpace=75)

Sharpening with filter2D

A simple 3×3 sharpen kernel adds a multiple of the Laplacian-like response (center positive, neighbors negative). Tune strength for your noise level.

import cv2
import numpy as np

img = cv2.imread("photo.jpg")
kernel = np.array([[ 0, -1,  0],
                   [-1,  5, -1],
                   [ 0, -1,  0]], dtype=np.float32)
sharp = cv2.filter2D(img, -1, kernel)

Depth -1 means output matches source depth; for float experiments, convert to float32, filter, clip, then back to uint8.

Borders and channels

Low-level filters need pixels outside the image. OpenCV defaults often use replication or reflection internally per function. Color images: most blur APIs run per-channel the same way; separable Gaussian is standard.

Takeaways

  • Gaussian blur for general-purpose smoothing before subsampling or noise reduction.
  • Median for impulse noise; bilateral when you need edge-preserving smoothing.
  • filter2D is the Swiss Army knife for custom kernels (sharpen, emboss, small conv nets).

Quick FAQ

Edges are high-frequency structure. Too much low-pass smoothing rounds off gradients, so peaks in the derivative shrink or vanish—tune blur to noise, not more than needed.

Same local sliding-window idea; CNNs learn kernels from data. Classical filters use fixed, hand-designed weights for interpretable preprocessing.

Edge detection

Gradients on a grid

For a 2D image I(x, y), the gradient ∇I = (∂I/∂x, ∂I/∂y) points in the direction of steepest brightness increase. On a discrete grid we approximate partial derivatives with finite differences—exactly what convolution kernels like Sobel implement. The gradient magnitude |∇I| ≈ √(Gx² + Gy²) highlights edges regardless of orientation; the angle atan2(Gy, Gx) describes edge direction.

Gx, Gy

Horizontal and vertical derivative images. Large values where intensity jumps across columns or rows.

Magnitude

Combines both directions—useful for visualization and non-max suppression in Canny.

Pre-blur

Derivatives amplify noise. A little Gaussian blur before Sobel/Canny stabilizes results.

Sobel and Scharr

cv2.Sobel computes separable 3×3 (or larger, odd) derivative filters. Use CV_64F (float) to avoid clipping negative slopes, then rescale for display. Scharr is a 3×3 variant with slightly better rotation invariance for equal cost.

import cv2
import numpy as np

gray = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (5, 5), 0)

gx = cv2.Sobel(blur, cv2.CV_64F, 1, 0, ksize=3)
gy = cv2.Sobel(blur, cv2.CV_64F, 0, 1, ksize=3)
mag = np.sqrt(gx * gx + gy * gy)
mag_u8 = np.uint8(255 * mag / (mag.max() + 1e-6))

gx_s = cv2.Scharr(blur, cv2.CV_64F, 1, 0)
gy_s = cv2.Scharr(blur, cv2.CV_64F, 0, 1)

Convert to uint8 only for saving or imshow; keep floats for further math.

Gradient angle (optional)

angle = np.arctan2(gy, gx)  # radians, in [-pi, pi]

Laplacian

The Laplacian is the second derivative (sum of ∂²/∂x² and ∂²/∂y²). It is sensitive to noise but highlights fine detail and zero-crossings near edges. Often used in sharpening (unsharp mask) and as a building block in blob detection.

import cv2

gray = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE)
lap = cv2.Laplacian(gray, cv2.CV_64F, ksize=3)
lap_vis = cv2.convertScaleAbs(lap)

convertScaleAbs maps floats to 8-bit for display after Laplacian/Sobel.

Canny edge detector

Canny is a multi-stage algorithm: Gaussian smoothing, gradient magnitude and angle, non-maximum suppression (thin edges along the gradient normal), and hysteresis thresholding with a low and high threshold. Pixels above threshold2 are strong edges; weak edges are kept only if connected to strong ones—this reduces broken fragments.

import cv2

gray = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (5, 5), 0)

# Classic ratio: t_high : t_low often ~ 2:1 or 3:1; tune per image scale
t1, t2 = 50, 150
edges = cv2.Canny(blur, t1, t2, apertureSize=3, L2gradient=False)

More Canny examples

# Finer edges (noisier): lower thresholds
fine = cv2.Canny(blur, 30, 90)

# Fewer, stronger edges: raise thresholds
coarse = cv2.Canny(blur, 100, 200)

# L2 gradient uses sqrt(Gx^2+Gy^2) internally for magnitude
edges_l2 = cv2.Canny(blur, 50, 150, apertureSize=3, L2gradient=True)

# Larger Sobel aperture inside Canny (5 or 7): slightly smoother gradients
edges_ap5 = cv2.Canny(blur, 50, 150, apertureSize=5)

Practical pipeline

For document or industrial images, thresholds are easier to set if intensities are normalized. One pattern: blur → optional CLAHE (see histogram chapter) → Canny. For color images, either convert to grayscale or run Canny per channel and combine with bitwise OR.

import cv2
import numpy as np

bgr = cv2.imread("scene.jpg")
gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)

med = np.median(gray)
t1 = int(max(0, 0.66 * med))
t2 = int(min(255, 1.33 * med))
auto_edges = cv2.Canny(gray, t1, t2)

Median-based heuristic for thresholds—starting point only; validate on your data.

Takeaways

  • Use float Sobel/Scharr, then visualize with convertScaleAbs or manual scaling.
  • Canny needs two thresholds and benefits from light pre-blur.
  • Tune thresholds per resolution and lighting; consider auto heuristics then refine.

Quick FAQ

Sobel gives thick “ridges” of high gradient; Canny thins them and links with hysteresis for cleaner contours. For many segmentation pre-steps, Canny is easier to use out of the box.

Too little blur → noise becomes edges. Too much → true edges weaken and disappear. Match blur σ to noise scale and image resolution.

Chapter FAQ

Quick FAQ

For 8-bit HSV, OpenCV stores hue as half degrees (0–179 maps to 0°–358°) so values fit in a byte.

No—choose by task. CNNs often learn in RGB; classical segmentation may benefit from HSV or LAB. Experiment and validate on your data.

Quick FAQ

The output canvas is rectangular; corners of the original image may fall outside after rotation. Those areas have no source pixels unless you enlarge the canvas or use a border mode.

Affine preserves parallelism; homography models perspective between two planes (e.g. camera tilt). Use homography when parallel lines in the world appear to meet in the image.