Computer Vision Chapter 3

Segmentation Basics

Thresholding (global, Otsu, adaptive) and morphological operations for binary masks and region cleanup.

Image thresholding

Global cv.threshold

The signature is retval, dst = cv2.threshold(src, thresh, maxval, type). For binary output, pixels above thresh become maxval (often 255), others become 0. THRESH_BINARY_INV flips the roles—useful when objects are darker than the background.

import cv2

gray = cv2.imread("scan.png", cv2.IMREAD_GRAYSCALE)
t, bin_img = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
_, bin_inv = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

Truncate and zero modes

# Values above 180 become 180; below unchanged (still grayscale)
_, trunc = cv2.threshold(gray, 180, 255, cv2.THRESH_TRUNC)

# Below threshold → 0; above → unchanged
_, tozero = cv2.threshold(gray, 100, 255, cv2.THRESH_TOZERO)
_, tozero_inv = cv2.threshold(gray, 100, 255, cv2.THRESH_TOZERO_INV)

Otsu and Triangle (automatic thresh)

Otsu picks a threshold by maximizing between-class variance of the histogram—works well for roughly bimodal histograms (clear foreground/background). Pass THRESH_OTSU as a flag combined with THRESH_BINARY; the returned t is the chosen value. Triangle fits a line from the histogram peak to the farthest point; good when one tail is long (e.g. bright objects on dark background).

import cv2

gray = cv2.imread("cells.png", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (5, 5), 0)

t_otsu, bin_o = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
print("Otsu threshold:", t_otsu)

t_tri, bin_t = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_TRIANGLE)
print("Triangle threshold:", t_tri)

The first argument thresh is ignored when Otsu/Triangle is used; OpenCV still requires a placeholder (commonly 0).

Otsu on inverted image

# If objects are dark on light paper, invert first or use BINARY_INV
inv = cv2.bitwise_not(gray)
t2, bin2 = cv2.threshold(inv, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

Adaptive thresholding

When illumination varies across the scene, a single global t fails. cv2.adaptiveThreshold computes a threshold from a blockSize × blockSize neighborhood around each pixel (odd size, e.g. 11, 21). ADAPTIVE_THRESH_MEAN_C uses the mean minus C; GAUSSIAN_C uses a weighted Gaussian window minus C.

import cv2

gray = cv2.imread("receipt.jpg", cv2.IMREAD_GRAYSCALE)
gray = cv2.GaussianBlur(gray, (3, 3), 0)

block, C = 15, 4
ad_mean = cv2.adaptiveThreshold(
    gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, block, C)
ad_gauss = cv2.adaptiveThreshold(
    gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 21, 5)

Increase blockSize for smoother, more global behavior; adjust C to bias lighter/darker as foreground.

Combined pipelines

Real workflows often chain blur → threshold → morphology (next chapter). Example: isolate dark text after evening out contrast.

import cv2

gray = cv2.imread("page.png", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
_, bw = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# bw: likely text as white — ready for morphological cleanup

Color “thresholding” with inRange

For colored objects, threshold each channel in HSV (or LAB) space. cv2.inRange returns a binary mask where all channel constraints hold.

import cv2
import numpy as np

bgr = cv2.imread("fruit.jpg")
hsv = cv2.cvtColor(bgr, cv2.COLOR_BGR2HSV)
lower = np.array([35, 60, 60])
upper = np.array([85, 255, 255])
mask = cv2.inRange(hsv, lower, upper)
fg = cv2.bitwise_and(bgr, bgr, mask=mask)

Tune lower/upper with sliders or by sampling pixels from the object; watch OpenCV’s H hue scale (0–179 for 8-bit).

Takeaways

  • Otsu for clean bimodal scenes; Triangle for skewed histograms.
  • Adaptive for shadows and uneven lighting on documents or outdoor text.
  • Use HSV + inRange when “brightness threshold” is not enough—separate hue from value.

Quick FAQ

Small Gaussian blur reduces salt-and-pepper noise so the histogram and local means are stabler—fewer speckles in the binary mask.

cv2.threshold expects single-channel 8-bit (or other supported types). For BGR, convert to gray or threshold each channel separately and combine masks with bitwise logic.

Morphological operations

Structuring elements

cv2.getStructuringElement(shape, ksize) builds the probe: MORPH_RECT, MORPH_ELLIPSE, or MORPH_CROSS. Larger kernels have stronger geometric effect—use odd sizes (3, 5, 7, …).

import cv2

k3 = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
k5e = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
k7c = cv2.getStructuringElement(cv2.MORPH_CROSS, (7, 7))

Erosion

Shrinks bright regions, removes thin protrusions, breaks narrow bridges.

Dilation

Grows bright regions, fills small holes, reconnects broken strokes.

Iterations

Repeat erosion/dilation iterations=n for stronger effect without huge kernels.

Erosion and dilation

import cv2

# bw: uint8 binary, foreground white (255)
bw = cv2.imread("mask.png", cv2.IMREAD_GRAYSCALE)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))

er = cv2.erode(bw, kernel, iterations=1)
dl = cv2.dilate(bw, kernel, iterations=1)

er2 = cv2.erode(bw, kernel, iterations=2)
dl3 = cv2.dilate(bw, kernel, iterations=3)

Border pixels: default border type is BORDER_CONSTANT with value 0—large iterations can “eat” edges of the image.

Opening and closing

Opening = erosion then dilation—removes small bright noise and smooths boundaries without growing the main objects much. Closing = dilation then erosion—fills small dark holes inside foreground and bridges narrow gaps.

import cv2

kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
opened = cv2.morphologyEx(bw, cv2.MORPH_OPEN, kernel, iterations=1)
closed = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, kernel, iterations=1)

# Equivalent explicit opening:
# tmp = cv2.erode(bw, kernel); opened = cv2.dilate(tmp, kernel)

Noise vs broken strokes

k3 = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
k5e = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))

# Salt on black background: opening cleans white specks
clean_fg = cv2.morphologyEx(bw, cv2.MORPH_OPEN, k3, iterations=1)

# Gaps in text strokes: closing helps
solid_text = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, k5e, iterations=1)

Gradient, top-hat, black-hat

Morphological gradient ≈ dilation minus erosion—outline of objects. Top-hat = image minus its opening—highlights small bright details on a dark background. Black-hat = closing minus image—dark details on bright background.

import cv2

grad = cv2.morphologyEx(bw, cv2.MORPH_GRADIENT, kernel)

gray = cv2.imread("texture.png", cv2.IMREAD_GRAYSCALE)
k = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 9))
tophat = cv2.morphologyEx(gray, cv2.MORPH_TOPHAT, k)
blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, k)

Hit-or-miss (shape matching)

MORPH_HITMISS finds pixels where a binary pattern and its complement align with a template kernel (hits, misses, and “don’t care” positions—see your OpenCV version’s rules for exact matrix values). Common on skeletonized text or thin structures to locate T-junctions or endpoints.

import cv2
import numpy as np

# Placeholder kernel — replace with a pattern from OpenCV hit-or-miss docs for your task
hitmiss_kernel = np.array([[0, 1, 0],
                           [1, 1, 0],
                           [0, 0, 0]], dtype=np.int8)
out = cv2.morphologyEx(bw, cv2.MORPH_HITMISS, hitmiss_kernel)

Encoding of 0/1/-1 differs by tutorial; always cross-check the official cv.morphologyEx hit-miss section for your build.

End-to-end: clean a binary scan

import cv2

gray = cv2.imread("scan.png", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (3, 3), 0)
_, bw = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

k = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
bw = cv2.morphologyEx(bw, cv2.MORPH_OPEN, k, iterations=1)
bw = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, k, iterations=1)

Takeaways

  • Opening removes small foreground noise; closing fills holes in foreground.
  • Prefer several small iterations or moderate kernels over one huge kernel for smoother control.
  • Gradient / top-hat / black-hat extract boundaries and fine structure on binary or gray images.

Quick FAQ

OpenCV morphology treats high values as “set” in the usual binary convention. If your mask has objects as 0 and background as 255, invert with cv2.bitwise_not first so semantics match your intent.

Ellipses round off corners isotropically—often nicer for blob-like objects. Rectangles align with axis-aligned text or grid structures.

Chapter FAQ

Quick FAQ

Small Gaussian blur reduces salt-and-pepper noise so the histogram and local means are stabler—fewer speckles in the binary mask.

cv2.threshold expects single-channel 8-bit (or other supported types). For BGR, convert to gray or threshold each channel separately and combine masks with bitwise logic.

Quick FAQ

OpenCV morphology treats high values as “set” in the usual binary convention. If your mask has objects as 0 and background as 255, invert with cv2.bitwise_not first so semantics match your intent.

Ellipses round off corners isotropically—often nicer for blob-like objects. Rectangles align with axis-aligned text or grid structures.