Classical vs learning-based
Classical / interactive
Fast, no training data; needs hand-tuned assumptions (color clusters, user scribbles, smooth regions). Great for controlled capture or preprocessing.
Deep segmentation
Learns appearance and context from labeled images; handles diverse scenes. Heavier compute and dataset requirements.
Threshold + morphology + contours
The simplest “segmentation” is a binary mask: foreground vs background. Clean it with morphology, then extract outer boundaries or filled regions.
import cv2
gray = cv2.imread("blob.png", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
_, bw = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
k = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
bw = cv2.morphologyEx(bw, cv2.MORPH_OPEN, k, iterations=1)
cnts, _ = cv2.findContours(bw, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
vis = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
cv2.drawContours(vis, cnts, -1, (0, 255, 0), 2)
Connected components
Label each 4- or 8-connected blob; filter by area for counting objects or removing speckles.
import cv2
n, labels, stats, centroids = cv2.connectedComponentsWithStats(bw, connectivity=8)
# stats: [label, x, y, width, height, area]
h, w = bw.shape
out = cv2.cvtColor(bw, cv2.COLOR_GRAY2BGR)
for i in range(1, n):
if stats[i, cv2.CC_STAT_AREA] < 100:
continue
x, y, ww, hh = stats[i, 0], stats[i, 1], stats[i, 2], stats[i, 3]
cv2.rectangle(out, (x, y), (x + ww, y + hh), (255, 0, 0), 1)
Watershed with markers
The watershed treats the inverted distance transform as a height map; without markers it oversegments. Provide sure foreground, sure background, and unknown regions for stable basins.
import cv2
import numpy as np
img = cv2.imread("coins.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, th = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# noise removal
kernel = np.ones((3, 3), np.uint8)
opening = cv2.morphologyEx(th, cv2.MORPH_OPEN, kernel, iterations=2)
sure_bg = cv2.dilate(opening, kernel, iterations=3)
dist = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
_, sure_fg = cv2.threshold(dist, 0.5 * dist.max(), 255, 0)
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg, sure_fg)
_, markers = cv2.connectedComponents(sure_fg)
markers = markers + 1
markers[unknown == 255] = 0
markers = cv2.watershed(img, markers)
img[markers == -1] = [0, 0, 255]
Boundaries are marked as -1 in markers; here drawn in red on the original BGR image.
GrabCut (interactive box)
GrabCut iteratively refines a Gaussian mixture model of foreground/background color. Start from a rectangle or a user mask.
import cv2
import numpy as np
img = cv2.imread("portrait.jpg")
mask = np.zeros(img.shape[:2], np.uint8)
bgd = np.zeros((1, 65), np.float64)
fgd = np.zeros((1, 65), np.float64)
h, w = img.shape[:2]
rect = (int(0.1 * w), int(0.05 * h), int(0.8 * w), int(0.9 * h))
cv2.grabCut(img, mask, rect, bgd, fgd, 5, cv2.GC_INIT_WITH_RECT)
binmask = np.where((mask == 2) | (mask == 0), 0, 1).astype("uint8")
fg = img * binmask[:, :, np.newaxis]
Refine with sure-foreground strokes (concept)
# mask values: GC_BGD, GC_FGD, GC_PR_BGD, GC_PR_FGD — set user scribbles then:
# cv2.grabCut(img, mask, None, bgd, fgd, 5, cv2.GC_INIT_WITH_MASK)
k-means clustering in LAB
Flatten pixels to feature vectors (e.g. L,a,b), run cv2.kmeans, map labels back to an image—simple color segmentation.
import cv2
import numpy as np
bgr = cv2.imread("fruit.jpg")
lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
h, w, c = lab.shape
Z = lab.reshape(-1, 3).astype(np.float32)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 20, 0.5)
K = 4
_, labels, centers = cv2.kmeans(Z, K, None, criteria, 10, cv2.KMEANS_PP_CENTERS)
centers_u8 = np.uint8(centers)
seg = centers_u8[labels.flatten()].reshape(h, w, 3)
seg_bgr = cv2.cvtColor(seg, cv2.COLOR_LAB2BGR)
What’s next in this series
Semantic segmentation assigns a class label to every pixel (road, sky, person) with networks like FCN, U-Net, DeepLab. Instance segmentation additionally separates individual object masks (Mask R-CNN). Follow the next hub pages for those topics when you are ready to move from classical pipelines to trained models.
Takeaways
- Always combine raw thresholds with morphology and area filters for robust masks.
- Watershed needs markers—distance transform + connected components is a standard recipe.
- GrabCut and k-means leverage color statistics; deep models add semantic understanding.