Color spaces
Why color spaces matter
A color space defines how we encode color as numbers. RGB (red, green, blue) is natural for displays: we mix light. For algorithms, RGB channels are correlated—changing illumination shifts all three, which makes simple thresholding on “red” unstable. Spaces like HSV separate hue (what color) from brightness, which helps picking a colored object under varying light. LAB is designed to be more perceptually uniform: equal numeric steps are closer to equal visual differences than in RGB.
RGB and OpenCV’s BGR
Each pixel stores three channel values. In theory RGB orders channels R–G–B. OpenCV loads color images as BGR by default, so img[y, x] returns [B, G, R]. Matplotlib and most deep-learning stacks expect RGB—always convert when mixing tools.
import cv2
bgr = cv2.imread("photo.jpg")
if bgr is not None:
rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
# rgb ready for matplotlib: plt.imshow(rgb)
Grayscale from color
Grayscale collapses three channels into one luminance value. A common weighted sum (ITU-R BT.601, used in many libraries) is:
Y ≈ 0.299·R + 0.587·G + 0.114·B — green gets the largest weight because human vision is most sensitive to mid-green.
import cv2
import numpy as np
bgr = cv2.imread("photo.jpg")
gray_cv = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # same idea as luminance formula
# Manual on a float RGB array in [0,1] (illustrative):
rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
y = 0.299 * rgb[...,0] + 0.587 * rgb[...,1] + 0.114 * rgb[...,2]
OpenCV’s COLOR_BGR2GRAY uses fixed integer coefficients suited for 8-bit images; results match the standard luma idea closely.
HSV: hue, saturation, value
Hue is the color wheel angle (often 0–179 in OpenCV’s 8-bit HSV). Saturation is “how vivid” the color is (gray → 0). Value is brightness. Separating hue from value makes it easier to threshold a specific color range (balls, fruits, signs) under modest lighting changes.
import cv2
bgr = cv2.imread("photo.jpg")
hsv = cv2.cvtColor(bgr, cv2.COLOR_BGR2HSV)
# Example: keep pixels with hue between h0 and h1 (tune per scene)
h0, h1 = 20, 35
mask = (hsv[:, :, 0] >= h0) & (hsv[:, :, 0] <= h1)
# combine with saturation/value thresholds for cleaner masks
In OpenCV, H is 0–179 for 8-bit images (half OpenCV’s 0–360 convention). Always visualize masks; lighting and white balance change optimal ranges.
LAB and YCrCb
LAB (L*a*b*)
L* is lightness; a* and b* are color-opponent channels. Useful for perceptual distance and some segmentation tasks. OpenCV: cv2.COLOR_BGR2LAB.
YCrCb
Separates luma (Y) from chrominance (Cr, Cb). Common in video and JPEG. Skin-color heuristics sometimes use Cr/Cb. OpenCV: cv2.COLOR_BGR2YCrCb.
import cv2
bgr = cv2.imread("photo.jpg")
lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
ycrcb = cv2.cvtColor(bgr, cv2.COLOR_BGR2YCrCb)
Conversion tips
- Prefer
cv2.cvtColorover hand-rolled formulas unless you need custom behavior—rounding and ranges are handled consistently. - After conversion, check
dtypeand value ranges (HSV and LAB ranges differ from 0–255 per channel in OpenCV). - For ML, many models expect RGB, NCHW or NHWC normalized floats; convert once in the dataset pipeline.
Takeaways
- OpenCV color images are BGR; convert for RGB consumers.
- HSV helps separate hue from brightness for simple color-based masks.
- LAB and YCrCb separate lightness from chroma for perceptual or video-style processing.
Quick FAQ
Image transformations
What is a geometric transform?
Each output pixel asks “which input coordinate should I sample?” That mapping can be a simple scale and shift, a rotation around a point, or a more general homography (perspective). Implementation samples the source image at (possibly fractional) locations using interpolation—nearest neighbor, bilinear, bicubic—trading speed vs smoothness.
Affine
Parallel lines stay parallel. Translation, rotation, scale, shear. Represented by a 2×3 matrix with warpAffine.
Perspective
Converging lines (e.g. road edges) are allowed. A 3×3 homography maps planes; warpPerspective for document “deskew” or bird’s-eye views.
Resize and scaling
cv2.resize changes width and height. For downscaling, INTER_AREA often reduces aliasing by averaging; for upscaling or small shifts, INTER_LINEAR is a common default. INTER_CUBIC can look smoother but costs more.
import cv2
img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
half = cv2.resize(img, (w // 2, h // 2), interpolation=cv2.INTER_AREA)
double = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_LINEAR)
Translation and rotation
Translation shifts the image by (tx, ty) pixels. Rotation is usually done around a center (often the image center) with optional scale. OpenCV builds a 2×3 affine matrix and applies warpAffine.
import cv2
import numpy as np
img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
# Translation only: [[1,0,tx],[0,1,ty]]
M_t = np.float32([[1, 0, 50], [0, 1, 30]])
shifted = cv2.warpAffine(img, M_t, (w, h), borderMode=cv2.BORDER_CONSTANT, borderValue=(0,0,0))
cx, cy = w / 2, h / 2
angle, scale = 15, 1.0
M_r = cv2.getRotationMatrix2D((cx, cy), angle, scale)
rotated = cv2.warpAffine(img, M_r, (w, h), borderMode=cv2.BORDER_REPLICATE)
BORDER_CONSTANT fills unknown areas with a color; BORDER_REPLICATE extends edge pixels—pick what fits your pipeline.
Flip and crop
cv2.flip mirrors horizontally (1), vertically (0), or both (-1)—cheap data augmentation for classification. Crop is just NumPy slicing img[y0:y1, x0:x1]; combine with resize to standard input sizes for networks.
import cv2
img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
aug_hflip = cv2.flip(img, 1)
patch = img[100:300, 50:250]
Perspective warp (homography)
Given four source corners and four destination corners (e.g. a tilted document → rectangle), getPerspectiveTransform + warpPerspective rectifies the plane. Quality depends on accurate corner detection.
import cv2
import numpy as np
img = cv2.imread("doc.jpg")
h, w = img.shape[:2]
src = np.float32([[100, 200], [400, 180], [420, 600], [80, 620]])
dst = np.float32([[0, 0], [w - 1, 0], [w - 1, h - 1], [0, h - 1]])
M = cv2.getPerspectiveTransform(src, dst)
flat = cv2.warpPerspective(img, M, (w, h))
Practical notes
- Bounding boxes and keypoints must be transformed with the same geometry as the image—otherwise labels no longer align.
- Augmentation stacks (rotate + crop + flip) should respect label semantics (e.g. left/right asymmetry in traffic signs).
- For video, temporal consistency matters: random heavy warps every frame can hurt trackers.
Takeaways
- Use
resizewith INTER_AREA when shrinking. getRotationMatrix2D+warpAffinecovers most 2D rigid/similar transforms in-plane.warpPerspectivefor plane-to-plane rectification when perspective is strong.
Quick FAQ
Image filtering
Convolution in one minute
At each location, place the kernel over the image, multiply overlapping values, sum (often normalize)—that sum becomes the new center pixel. Correlation is the same idea without flipping the kernel; in OpenCV’s filter2D the kernel is used as given. Borders need a policy: extend edges, reflect, wrap, or constant padding.
Low-pass (blur)
Weights positive and localized—averages neighbors, suppresses high frequencies (noise, fine texture).
High-pass (sharpen)
Emphasizes differences from neighbors—edges pop; also amplifies noise if overdone.
Box and Gaussian blur
Box blur (cv2.blur) uses equal weights—fast but can look “blocky.” Gaussian blur weights fall off with distance from the center; sigma controls spread. Larger kernels or sigmas mean more smoothing and softer edges.
import cv2
img = cv2.imread("photo.jpg")
box = cv2.blur(img, (5, 5))
gauss = cv2.GaussianBlur(img, (0, 0), sigmaX=1.5) # ksize (0,0) → from sigma
gauss_k = cv2.GaussianBlur(img, (7, 7), 0)
Gaussian kernels are separable (apply 1D horizontal then vertical)—that is why large Gaussian blurs stay efficient.
Median and edge-aware smoothing
Median blur replaces the center with the neighborhood median—excellent for salt-and-pepper noise while preserving step edges better than linear blur. Kernel size must be odd.
import cv2
img = cv2.imread("noisy.jpg")
med = cv2.medianBlur(img, 5)
# Optional: bilateral filter — smooths flat regions, keeps strong edges (slower)
bilateral = cv2.bilateralFilter(img, d=9, sigmaColor=75, sigmaSpace=75)
Sharpening with filter2D
A simple 3×3 sharpen kernel adds a multiple of the Laplacian-like response (center positive, neighbors negative). Tune strength for your noise level.
import cv2
import numpy as np
img = cv2.imread("photo.jpg")
kernel = np.array([[ 0, -1, 0],
[-1, 5, -1],
[ 0, -1, 0]], dtype=np.float32)
sharp = cv2.filter2D(img, -1, kernel)
Depth -1 means output matches source depth; for float experiments, convert to float32, filter, clip, then back to uint8.
Borders and channels
Low-level filters need pixels outside the image. OpenCV defaults often use replication or reflection internally per function. Color images: most blur APIs run per-channel the same way; separable Gaussian is standard.
Takeaways
- Gaussian blur for general-purpose smoothing before subsampling or noise reduction.
- Median for impulse noise; bilateral when you need edge-preserving smoothing.
filter2Dis the Swiss Army knife for custom kernels (sharpen, emboss, small conv nets).
Quick FAQ
Edge detection
Gradients on a grid
For a 2D image I(x, y), the gradient ∇I = (∂I/∂x, ∂I/∂y) points in the direction of steepest brightness increase. On a discrete grid we approximate partial derivatives with finite differences—exactly what convolution kernels like Sobel implement. The gradient magnitude |∇I| ≈ √(Gx² + Gy²) highlights edges regardless of orientation; the angle atan2(Gy, Gx) describes edge direction.
Gx, Gy
Horizontal and vertical derivative images. Large values where intensity jumps across columns or rows.
Magnitude
Combines both directions—useful for visualization and non-max suppression in Canny.
Pre-blur
Derivatives amplify noise. A little Gaussian blur before Sobel/Canny stabilizes results.
Sobel and Scharr
cv2.Sobel computes separable 3×3 (or larger, odd) derivative filters. Use CV_64F (float) to avoid clipping negative slopes, then rescale for display. Scharr is a 3×3 variant with slightly better rotation invariance for equal cost.
import cv2
import numpy as np
gray = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
gx = cv2.Sobel(blur, cv2.CV_64F, 1, 0, ksize=3)
gy = cv2.Sobel(blur, cv2.CV_64F, 0, 1, ksize=3)
mag = np.sqrt(gx * gx + gy * gy)
mag_u8 = np.uint8(255 * mag / (mag.max() + 1e-6))
gx_s = cv2.Scharr(blur, cv2.CV_64F, 1, 0)
gy_s = cv2.Scharr(blur, cv2.CV_64F, 0, 1)
Convert to uint8 only for saving or imshow; keep floats for further math.
Gradient angle (optional)
angle = np.arctan2(gy, gx) # radians, in [-pi, pi]
Laplacian
The Laplacian is the second derivative (sum of ∂²/∂x² and ∂²/∂y²). It is sensitive to noise but highlights fine detail and zero-crossings near edges. Often used in sharpening (unsharp mask) and as a building block in blob detection.
import cv2
gray = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE)
lap = cv2.Laplacian(gray, cv2.CV_64F, ksize=3)
lap_vis = cv2.convertScaleAbs(lap)
convertScaleAbs maps floats to 8-bit for display after Laplacian/Sobel.
Canny edge detector
Canny is a multi-stage algorithm: Gaussian smoothing, gradient magnitude and angle, non-maximum suppression (thin edges along the gradient normal), and hysteresis thresholding with a low and high threshold. Pixels above threshold2 are strong edges; weak edges are kept only if connected to strong ones—this reduces broken fragments.
import cv2
gray = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
# Classic ratio: t_high : t_low often ~ 2:1 or 3:1; tune per image scale
t1, t2 = 50, 150
edges = cv2.Canny(blur, t1, t2, apertureSize=3, L2gradient=False)
More Canny examples
# Finer edges (noisier): lower thresholds
fine = cv2.Canny(blur, 30, 90)
# Fewer, stronger edges: raise thresholds
coarse = cv2.Canny(blur, 100, 200)
# L2 gradient uses sqrt(Gx^2+Gy^2) internally for magnitude
edges_l2 = cv2.Canny(blur, 50, 150, apertureSize=3, L2gradient=True)
# Larger Sobel aperture inside Canny (5 or 7): slightly smoother gradients
edges_ap5 = cv2.Canny(blur, 50, 150, apertureSize=5)
Practical pipeline
For document or industrial images, thresholds are easier to set if intensities are normalized. One pattern: blur → optional CLAHE (see histogram chapter) → Canny. For color images, either convert to grayscale or run Canny per channel and combine with bitwise OR.
import cv2
import numpy as np
bgr = cv2.imread("scene.jpg")
gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
med = np.median(gray)
t1 = int(max(0, 0.66 * med))
t2 = int(min(255, 1.33 * med))
auto_edges = cv2.Canny(gray, t1, t2)
Median-based heuristic for thresholds—starting point only; validate on your data.
Takeaways
- Use float Sobel/Scharr, then visualize with
convertScaleAbsor manual scaling. - Canny needs two thresholds and benefits from light pre-blur.
- Tune thresholds per resolution and lighting; consider auto heuristics then refine.