Computer Vision Chapter 15

Segmentation overview

Segmentation partitions an image into meaningful regions: pixels belonging to the same object, surface, or semantic class. Classical tools use intensity, color, edges, and region growing; modern semantic and instance segmentation use deep nets to label every pixel (and separate individual objects). This chapter surveys OpenCV building blocks—thresholding chains, connected components, watershed, GrabCut, and k-means in color space—before you dive into FCN-style models in the next tutorials.

Classical vs learning-based

Classical / interactive

Fast, no training data; needs hand-tuned assumptions (color clusters, user scribbles, smooth regions). Great for controlled capture or preprocessing.

Deep segmentation

Learns appearance and context from labeled images; handles diverse scenes. Heavier compute and dataset requirements.

Threshold + morphology + contours

The simplest “segmentation” is a binary mask: foreground vs background. Clean it with morphology, then extract outer boundaries or filled regions.

import cv2

gray = cv2.imread("blob.png", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
_, bw = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

k = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
bw = cv2.morphologyEx(bw, cv2.MORPH_OPEN, k, iterations=1)

cnts, _ = cv2.findContours(bw, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
vis = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
cv2.drawContours(vis, cnts, -1, (0, 255, 0), 2)

Connected components

Label each 4- or 8-connected blob; filter by area for counting objects or removing speckles.

import cv2

n, labels, stats, centroids = cv2.connectedComponentsWithStats(bw, connectivity=8)
# stats: [label, x, y, width, height, area]
h, w = bw.shape
out = cv2.cvtColor(bw, cv2.COLOR_GRAY2BGR)
for i in range(1, n):
    if stats[i, cv2.CC_STAT_AREA] < 100:
        continue
    x, y, ww, hh = stats[i, 0], stats[i, 1], stats[i, 2], stats[i, 3]
    cv2.rectangle(out, (x, y), (x + ww, y + hh), (255, 0, 0), 1)

Watershed with markers

The watershed treats the inverted distance transform as a height map; without markers it oversegments. Provide sure foreground, sure background, and unknown regions for stable basins.

import cv2
import numpy as np

img = cv2.imread("coins.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, th = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

# noise removal
kernel = np.ones((3, 3), np.uint8)
opening = cv2.morphologyEx(th, cv2.MORPH_OPEN, kernel, iterations=2)
sure_bg = cv2.dilate(opening, kernel, iterations=3)

dist = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
_, sure_fg = cv2.threshold(dist, 0.5 * dist.max(), 255, 0)
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg, sure_fg)

_, markers = cv2.connectedComponents(sure_fg)
markers = markers + 1
markers[unknown == 255] = 0

markers = cv2.watershed(img, markers)
img[markers == -1] = [0, 0, 255]

Boundaries are marked as -1 in markers; here drawn in red on the original BGR image.

GrabCut (interactive box)

GrabCut iteratively refines a Gaussian mixture model of foreground/background color. Start from a rectangle or a user mask.

import cv2
import numpy as np

img = cv2.imread("portrait.jpg")
mask = np.zeros(img.shape[:2], np.uint8)
bgd = np.zeros((1, 65), np.float64)
fgd = np.zeros((1, 65), np.float64)
h, w = img.shape[:2]
rect = (int(0.1 * w), int(0.05 * h), int(0.8 * w), int(0.9 * h))

cv2.grabCut(img, mask, rect, bgd, fgd, 5, cv2.GC_INIT_WITH_RECT)
binmask = np.where((mask == 2) | (mask == 0), 0, 1).astype("uint8")
fg = img * binmask[:, :, np.newaxis]

Refine with sure-foreground strokes (concept)

# mask values: GC_BGD, GC_FGD, GC_PR_BGD, GC_PR_FGD — set user scribbles then:
# cv2.grabCut(img, mask, None, bgd, fgd, 5, cv2.GC_INIT_WITH_MASK)

k-means clustering in LAB

Flatten pixels to feature vectors (e.g. L,a,b), run cv2.kmeans, map labels back to an image—simple color segmentation.

import cv2
import numpy as np

bgr = cv2.imread("fruit.jpg")
lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
h, w, c = lab.shape
Z = lab.reshape(-1, 3).astype(np.float32)

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 20, 0.5)
K = 4
_, labels, centers = cv2.kmeans(Z, K, None, criteria, 10, cv2.KMEANS_PP_CENTERS)

centers_u8 = np.uint8(centers)
seg = centers_u8[labels.flatten()].reshape(h, w, 3)
seg_bgr = cv2.cvtColor(seg, cv2.COLOR_LAB2BGR)

What’s next in this series

Semantic segmentation assigns a class label to every pixel (road, sky, person) with networks like FCN, U-Net, DeepLab. Instance segmentation additionally separates individual object masks (Mask R-CNN). Follow the next hub pages for those topics when you are ready to move from classical pipelines to trained models.

Takeaways

  • Always combine raw thresholds with morphology and area filters for robust masks.
  • Watershed needs markers—distance transform + connected components is a standard recipe.
  • GrabCut and k-means leverage color statistics; deep models add semantic understanding.

Quick FAQ

Improve the sure-foreground mask (higher distance threshold, better thresholding) or incorporate gradient-based markers. Markers are the main control lever.

Try elbow method on within-cluster error, or pick K from the number of dominant colors you expect. For objects with shared colors, pure k-means will merge them—use learning-based segmentation instead.