Computer Vision Chapter 12

3D Vision & Depth

3D vision introduction, camera calibration, and stereo depth from disparity.

3D vision (introduction)

Pinhole projection

A 3D point X = (X, Y, Z) in the camera frame projects to the image plane via focal length f and principal point (cx, cy):

x = f · X/Z + cx, y = f · Y/Z + cy

In homogeneous coordinates this is a 3×4 projection matrix K [R | t] combining intrinsics K and extrinsics R, t. Lens distortion (radial/tangential) is corrected before accurate geometric reasoning—see the calibration chapter.

Depth from stereo

Two calibrated cameras with known relative pose (baseline B) view the same scene. A 3D point appears at horizontal positions xL, xR after rectification. Disparity d = xL − xR relates to depth Z ≈ f · B / d (up to units). Dense stereo estimates disparity at every pixel (SGBM, BM in OpenCV); quality depends on texture and calibration.

Rectification

Warp both images so epipolar lines align horizontally—simplifies matching to 1D search along rows.

Monocular depth

Single-image CNNs (MiDaS, DPT) predict relative depth without stereo—useful but scale-ambiguous without constraints.

OpenCV stereo (workflow sketch)

import cv2
import numpy as np

# After calibration: cameraMatrix1, dist1, cameraMatrix2, dist2, R, T
# stereoRectify → R1, R2, P1, P2, Q, roi1, roi2
# initUndistortRectifyMap + remap for left/right rectified images

# Example disparity (tune numDisparities, blockSize)
left = cv2.imread("left.png", cv2.IMREAD_GRAYSCALE)
right = cv2.imread("right.png", cv2.IMREAD_GRAYSCALE)
stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=128, blockSize=5)
disp = stereo.compute(left, right).astype(np.float32) / 16.0

# Q from stereoRectify — reproject to XYZ in homogeneous coords
# points_3d = cv2.reprojectImageTo3D(disp, Q)

You must obtain Q from a proper stereoRectify with calibrated intrinsics and stereo extrinsics—placeholder comments mark the missing steps.

Point clouds and meshes

A point cloud is a set of (x, y, z) samples, often with color (RGB-D) or normals. Formats: PLY, PCD, LAS. Downstream: ICP for alignment, RANSAC for plane fitting, Poisson / Delaunay for meshing. Libraries: Open3D, PCL, CloudCompare.

Two-view triangulation

Given matched points in two images and known projection matrices, cv2.triangulatePoints recovers homogeneous 3D coordinates. Reprojection error measures calibration and correspondence quality.

import cv2
import numpy as np

# P1, P2: 3x4 projection matrices; x1, x2: 2xN homogeneous pixel coords
X_h = cv2.triangulatePoints(P1, P2, x1, x2)
X = (X_h[:3] / X_h[3]).T

What comes next

The following tutorials in this hub cover camera calibration (intrinsics, distortion), stereo vision in depth, and SLAM for simultaneous localization and mapping—closing the loop from single images to moving sensors.

Takeaways

  • 3D reasoning starts from the pinhole model and calibrated K, R, t.
  • Stereo disparity gives metric depth given baseline and rectification.
  • Point clouds unify depth outputs for robotics and 3D analytics.

Quick FAQ

Reflective or textureless surfaces break stereo matching; combine multiple views, structured light, or temporal filtering. Check calibration and exposure sync between cameras.

RGB-D (Kinect-class) projects a pattern or uses ToF for dense depth at short range; stereo scales with baseline and works outdoors with good calibration.

Camera calibration

Chessboard object points

Define 3D coordinates of inner corners in board units (e.g. mm). For a board with cols × rows inner corners, Z = 0 on the plane.

import numpy as np

cols, rows = 9, 6
square_size = 25.0  # mm
objp = np.zeros((cols * rows, 3), np.float32)
objp[:, :2] = np.mgrid[0:cols, 0:rows].T.reshape(-1, 2)
objp *= square_size

Find and refine corners

import cv2

img = cv2.imread("calib_01.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
pattern = (cols, rows)
found, corners = cv2.findChessboardCorners(gray, pattern, None)

if found:
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
    corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
    cv2.drawChessboardCorners(img, pattern, corners2, found)

Adaptive flags for hard images

flags = cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_NORMALIZE_IMAGE
found, corners = cv2.findChessboardCornersSB(gray, pattern, flags)  # often sharper

findChessboardCornersSB is available in newer OpenCV; fall back to findChessboardCorners if missing.

Run calibrateCamera

Collect objpoints (repeated objp per image) and imgpoints (detected corners). Image size is (width, height).

objpoints = []   # 3d in world
imgpoints = []   # 2d in image

# loop over images: append objp and corners2 when found
# ...

h, w = gray.shape[:2]
rms, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(
    objpoints, imgpoints, (w, h), None, None,
    flags=cv2.CALIB_FIX_K3)  # optional: fix higher radial terms for wide FOV

print("RMS reprojection error (px):", rms)
print("K:
", mtx)
print("dist:", dist.ravel())

Undistort

One-off

dst = cv2.undistort(img, mtx, dist)

Remap (faster for video)

newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (w, h), 1, (w, h))
mapx, mapy = cv2.initUndistortRectifyMap(mtx, dist, None, newcameramtx, (w, h), cv2.CV_32FC1)
undist = cv2.remap(img, mapx, mapy, cv2.INTER_LINEAR)
x, y, w2, h2 = roi
undist_cropped = undist[y:y+h2, x:x+w2]

projectPoints and error check

imgpts, _ = cv2.projectPoints(objp, rvecs[0], tvecs[0], mtx, dist)
err = cv2.norm(imgpoints[0], imgpts.reshape(-1, 2), cv2.NORM_L2) / len(imgpts)

Average per-corner error for one image; aggregate over the dataset to judge calibration quality.

Save and load

np.savez("calib.npz", mtx=mtx, dist=dist, rms=rms)
D = np.load("calib.npz")
mtx, dist = D["mtx"], D["dist"]

Write YAML with OpenCV

fs = cv2.FileStorage("calib.yml", cv2.FILE_STORAGE_WRITE)
fs.write("camera_matrix", mtx)
fs.write("dist_coeffs", dist)
fs.release()

Practical tips

  • Cover the full frame and multiple angles; avoid all images from the same pose.
  • Use sharp prints; matte board reduces glare.
  • For strong wide-angle / fisheye, consider cv2.fisheye routines instead of the pinhole + Brown model.

Takeaways

  • K and dist describe the camera; rvec, tvec per image place the board in camera coordinates.
  • Lower RMS → better fit; outliers usually mean bad corner detection.
  • Use remap when undistorting video streams.

Quick FAQ

Re-detect corners with sub-pixel refinement, remove blurry frames, verify pattern size (inner corners count), and ensure square size is consistent.

Normal—manufacturing and alignment shift cx, cy. Never force it to center unless you have a reason (some SLAM systems fix principal point with extra flags).

Stereo vision

Depth from disparity

After rectification, corresponding points lie on the same scanline. If focal length is f (pixels) and baseline is B (same units as world), depth Z ≈ f · B / d where d is disparity in pixels. OpenCV’s Q matrix from stereoRectify encodes this relationship for reprojectImageTo3D.

Stereo calibration and rectification

Assume each camera is calibrated (K1, D1, K2, D2). Collect paired chessboard views; stereoCalibrate estimates R, T from camera 1 to camera 2, then stereoRectify builds rectifying transforms and Q.

import cv2
import numpy as np

# K1, D1, K2, D2 from mono calib; image_size = (w, h)
flags = cv2.CALIB_FIX_INTRINSIC
criteria = (cv2.TERM_CRITERIA_MAX_ITER + cv2.TERM_CRITERIA_EPS, 100, 1e-5)

rms, K1o, D1o, K2o, D2o, R, T, E, F = cv2.stereoCalibrate(
    objpoints, imgpoints_l, imgpoints_r,
    K1, D1, K2, D2, image_size, criteria=criteria, flags=flags)

R1, R2, P1, P2, Q, roi1, roi2 = cv2.stereoRectify(
    K1o, D1o, K2o, D2o, image_size, R, T, alpha=0)

Python returns nine values: retval, K1, D1, K2, D2, R, T, E, F. With CALIB_FIX_INTRINSIC, K*/D* usually match inputs but must still be unpacked.

Build remap maps

map1x, map1y = cv2.initUndistortRectifyMap(K1o, D1o, R1, P1, image_size, cv2.CV_32FC1)
map2x, map2y = cv2.initUndistortRectifyMap(K2o, D2o, R2, P2, image_size, cv2.CV_32FC1)

left = cv2.imread("left.png", cv2.IMREAD_GRAYSCALE)
right = cv2.imread("right.png", cv2.IMREAD_GRAYSCALE)
left_r = cv2.remap(left, map1x, map1y, cv2.INTER_LINEAR)
right_r = cv2.remap(right, map2x, map2y, cv2.INTER_LINEAR)

alpha=0 crops valid pixels; alpha=1 keeps all pixels (may introduce invalid areas).

StereoBM (fast block matching)

stereo_bm = cv2.StereoBM_create(numDisparities=128, blockSize=15)
stereo_bm.setPreFilterCap(31)
stereo_bm.setTextureThreshold(10)
stereo_bm.setUniquenessRatio(15)
stereo_bm.setSpeckleWindowSize(100)
stereo_bm.setSpeckleRange(2)

disp_bm = stereo_bm.compute(left_r, right_r).astype(np.float32) / 16.0

StereoSGBM (quality presets)

# Preset A: balanced
sgbm_a = cv2.StereoSGBM_create(
    minDisparity=0, numDisparities=128, blockSize=5,
    P1=8 * 3 * 5**2, P2=32 * 3 * 5**2,
    disp12MaxDiff=1, uniquenessRatio=10,
    speckleWindowSize=100, speckleRange=2, mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY)

# Preset B: finer but slower (larger P1/P2 for 5x5 window)
win = 7
P1, P2 = 8 * 3 * win**2, 32 * 3 * win**2
sgbm_b = cv2.StereoSGBM_create(
    minDisparity=0, numDisparities=256, blockSize=win,
    P1=P1, P2=P2, uniquenessRatio=5, speckleWindowSize=150)

disp = sgbm_a.compute(left_r, right_r).astype(np.float32) / 16.0

numDisparities must be divisible by 16. Tune blockSize (odd): larger → smoother, less detail.

Optional: WLS filter (smoother disparity)

right_matcher = cv2.ximgproc.createRightMatcher(sgbm_a)
disp_left = sgbm_a.compute(left_r, right_r)
disp_right = right_matcher.compute(right_r, left_r)

wls = cv2.ximgproc.createDisparityWLSFilter(matcher_left=sgbm_a)
wls.setLambda(8000)
wls.setSigmaColor(1.5)
disp_wls = wls.filter(disp_left, left_r, None, disp_right)

Requires opencv-contrib module ximgproc.

Reproject to XYZ

points_3d = cv2.reprojectImageTo3D(disp, Q)
mask = disp > disp.min()
cloud = points_3d[mask]  # Nx3 float32, coordinates in space of Q

Visualize disparity

disp_vis = cv2.normalize(disp, None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
disp_color = cv2.applyColorMap(disp_vis, cv2.COLORMAP_JET)

Takeaways

  • Rectification is mandatory for standard row-aligned matchers.
  • SGBM usually beats BM on thin structures; both need texture.
  • Use Q + valid disparity mask for meaningful 3D points.

Quick FAQ

Check exposure sync, rectification quality, and increase texture (projected pattern) or reduce baseline if range is wrong. Wrong numDisparities clips true shifts.

Express baseline in meters and use consistent units in calibration; square_size in object points must match real board for metric scale.

Chapter FAQ

Quick FAQ

Reflective or textureless surfaces break stereo matching; combine multiple views, structured light, or temporal filtering. Check calibration and exposure sync between cameras.

RGB-D (Kinect-class) projects a pattern or uses ToF for dense depth at short range; stereo scales with baseline and works outdoors with good calibration.

Quick FAQ

Re-detect corners with sub-pixel refinement, remove blurry frames, verify pattern size (inner corners count), and ensure square size is consistent.

Normal—manufacturing and alignment shift cx, cy. Never force it to center unless you have a reason (some SLAM systems fix principal point with extra flags).