Image segmentation: CV guide

Classical vs learning-based

Classical / interactive

Fast, no training data; needs hand-tuned assumptions (color clusters, user scribbles, smooth regions). Great for controlled capture or preprocessing.

Deep segmentation

Learns appearance and context from labeled images; handles diverse scenes. Heavier compute and dataset requirements.

Threshold + morphology + contours

The simplest “segmentation” is a binary mask: foreground vs background. Clean it with morphology, then extract outer boundaries or filled regions.

import cv2

gray = cv2.imread("blob.png", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
_, bw = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

k = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
bw = cv2.morphologyEx(bw, cv2.MORPH_OPEN, k, iterations=1)

cnts, _ = cv2.findContours(bw, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
vis = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
cv2.drawContours(vis, cnts, -1, (0, 255, 0), 2)

Connected components

Label each 4- or 8-connected blob; filter by area for counting objects or removing speckles.

import cv2

n, labels, stats, centroids = cv2.connectedComponentsWithStats(bw, connectivity=8)
# stats: [label, x, y, width, height, area]
h, w = bw.shape
out = cv2.cvtColor(bw, cv2.COLOR_GRAY2BGR)
for i in range(1, n):
    if stats[i, cv2.CC_STAT_AREA] < 100:
        continue
    x, y, ww, hh = stats[i, 0], stats[i, 1], stats[i, 2], stats[i, 3]
    cv2.rectangle(out, (x, y), (x + ww, y + hh), (255, 0, 0), 1)

Watershed with markers

The watershed treats the inverted distance transform as a height map; without markers it oversegments. Provide sure foreground, sure background, and unknown regions for stable basins.

import cv2
import numpy as np

img = cv2.imread("coins.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, th = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

# noise removal
kernel = np.ones((3, 3), np.uint8)
opening = cv2.morphologyEx(th, cv2.MORPH_OPEN, kernel, iterations=2)
sure_bg = cv2.dilate(opening, kernel, iterations=3)

dist = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
_, sure_fg = cv2.threshold(dist, 0.5 * dist.max(), 255, 0)
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg, sure_fg)

_, markers = cv2.connectedComponents(sure_fg)
markers = markers + 1
markers[unknown == 255] = 0

markers = cv2.watershed(img, markers)
img[markers == -1] = [0, 0, 255]

Boundaries are marked as -1 in markers; here drawn in red on the original BGR image.

GrabCut (interactive box)

GrabCut iteratively refines a Gaussian mixture model of foreground/background color. Start from a rectangle or a user mask.

import cv2
import numpy as np

img = cv2.imread("portrait.jpg")
mask = np.zeros(img.shape[:2], np.uint8)
bgd = np.zeros((1, 65), np.float64)
fgd = np.zeros((1, 65), np.float64)
h, w = img.shape[:2]
rect = (int(0.1 * w), int(0.05 * h), int(0.8 * w), int(0.9 * h))

cv2.grabCut(img, mask, rect, bgd, fgd, 5, cv2.GC_INIT_WITH_RECT)
binmask = np.where((mask == 2) | (mask == 0), 0, 1).astype("uint8")
fg = img * binmask[:, :, np.newaxis]

Refine with sure-foreground strokes (concept)

# mask values: GC_BGD, GC_FGD, GC_PR_BGD, GC_PR_FGD — set user scribbles then:
# cv2.grabCut(img, mask, None, bgd, fgd, 5, cv2.GC_INIT_WITH_MASK)

k-means clustering in LAB

Flatten pixels to feature vectors (e.g. L,a,b), run cv2.kmeans, map labels back to an image—simple color segmentation.

import cv2
import numpy as np

bgr = cv2.imread("fruit.jpg")
lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
h, w, c = lab.shape
Z = lab.reshape(-1, 3).astype(np.float32)

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 20, 0.5)
K = 4
_, labels, centers = cv2.kmeans(Z, K, None, criteria, 10, cv2.KMEANS_PP_CENTERS)

centers_u8 = np.uint8(centers)
seg = centers_u8[labels.flatten()].reshape(h, w, 3)
seg_bgr = cv2.cvtColor(seg, cv2.COLOR_LAB2BGR)

What’s next in this series

Semantic segmentation assigns a class label to every pixel (road, sky, person) with networks like FCN, U-Net, DeepLab. Instance segmentation additionally separates individual object masks (Mask R-CNN). Follow the next hub pages for those topics when you are ready to move from classical pipelines to trained models.

                    Takeaways
                    Always combine raw thresholds with morphology and area filters for robust masks.
Watershed needs markers—distance transform + connected components is a standard recipe.
GrabCut and k-means leverage color statistics; deep models add semantic understanding.

                

Quick FAQ

Improve the sure-foreground mask (higher distance threshold, better thresholding) or incorporate gradient-based markers. Markers are the main control lever.

Try elbow method on within-cluster error, or pick K from the number of dominant colors you expect. For objects with shared colors, pure k-means will merge them—use learning-based segmentation instead.