3D vision (introduction)
Pinhole projection
A 3D point X = (X, Y, Z) in the camera frame projects to the image plane via focal length f and principal point (cx, cy):
x = f · X/Z + cx, y = f · Y/Z + cy
In homogeneous coordinates this is a 3×4 projection matrix K [R | t] combining intrinsics K and extrinsics R, t. Lens distortion (radial/tangential) is corrected before accurate geometric reasoning—see the calibration chapter.
Depth from stereo
Two calibrated cameras with known relative pose (baseline B) view the same scene. A 3D point appears at horizontal positions xL, xR after rectification. Disparity d = xL − xR relates to depth Z ≈ f · B / d (up to units). Dense stereo estimates disparity at every pixel (SGBM, BM in OpenCV); quality depends on texture and calibration.
Rectification
Warp both images so epipolar lines align horizontally—simplifies matching to 1D search along rows.
Monocular depth
Single-image CNNs (MiDaS, DPT) predict relative depth without stereo—useful but scale-ambiguous without constraints.
OpenCV stereo (workflow sketch)
import cv2
import numpy as np
# After calibration: cameraMatrix1, dist1, cameraMatrix2, dist2, R, T
# stereoRectify → R1, R2, P1, P2, Q, roi1, roi2
# initUndistortRectifyMap + remap for left/right rectified images
# Example disparity (tune numDisparities, blockSize)
left = cv2.imread("left.png", cv2.IMREAD_GRAYSCALE)
right = cv2.imread("right.png", cv2.IMREAD_GRAYSCALE)
stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=128, blockSize=5)
disp = stereo.compute(left, right).astype(np.float32) / 16.0
# Q from stereoRectify — reproject to XYZ in homogeneous coords
# points_3d = cv2.reprojectImageTo3D(disp, Q)
You must obtain Q from a proper stereoRectify with calibrated intrinsics and stereo extrinsics—placeholder comments mark the missing steps.
Point clouds and meshes
A point cloud is a set of (x, y, z) samples, often with color (RGB-D) or normals. Formats: PLY, PCD, LAS. Downstream: ICP for alignment, RANSAC for plane fitting, Poisson / Delaunay for meshing. Libraries: Open3D, PCL, CloudCompare.
Two-view triangulation
Given matched points in two images and known projection matrices, cv2.triangulatePoints recovers homogeneous 3D coordinates. Reprojection error measures calibration and correspondence quality.
import cv2
import numpy as np
# P1, P2: 3x4 projection matrices; x1, x2: 2xN homogeneous pixel coords
X_h = cv2.triangulatePoints(P1, P2, x1, x2)
X = (X_h[:3] / X_h[3]).T
What comes next
The following tutorials in this hub cover camera calibration (intrinsics, distortion), stereo vision in depth, and SLAM for simultaneous localization and mapping—closing the loop from single images to moving sensors.
Takeaways
- 3D reasoning starts from the pinhole model and calibrated
K, R, t. - Stereo disparity gives metric depth given baseline and rectification.
- Point clouds unify depth outputs for robotics and 3D analytics.
Quick FAQ
Camera calibration
Chessboard object points
Define 3D coordinates of inner corners in board units (e.g. mm). For a board with cols × rows inner corners, Z = 0 on the plane.
import numpy as np
cols, rows = 9, 6
square_size = 25.0 # mm
objp = np.zeros((cols * rows, 3), np.float32)
objp[:, :2] = np.mgrid[0:cols, 0:rows].T.reshape(-1, 2)
objp *= square_size
Find and refine corners
import cv2
img = cv2.imread("calib_01.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
pattern = (cols, rows)
found, corners = cv2.findChessboardCorners(gray, pattern, None)
if found:
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
cv2.drawChessboardCorners(img, pattern, corners2, found)
Adaptive flags for hard images
flags = cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_NORMALIZE_IMAGE
found, corners = cv2.findChessboardCornersSB(gray, pattern, flags) # often sharper
findChessboardCornersSB is available in newer OpenCV; fall back to findChessboardCorners if missing.
Run calibrateCamera
Collect objpoints (repeated objp per image) and imgpoints (detected corners). Image size is (width, height).
objpoints = [] # 3d in world
imgpoints = [] # 2d in image
# loop over images: append objp and corners2 when found
# ...
h, w = gray.shape[:2]
rms, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(
objpoints, imgpoints, (w, h), None, None,
flags=cv2.CALIB_FIX_K3) # optional: fix higher radial terms for wide FOV
print("RMS reprojection error (px):", rms)
print("K:
", mtx)
print("dist:", dist.ravel())
Undistort
One-off
dst = cv2.undistort(img, mtx, dist)
Remap (faster for video)
newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (w, h), 1, (w, h))
mapx, mapy = cv2.initUndistortRectifyMap(mtx, dist, None, newcameramtx, (w, h), cv2.CV_32FC1)
undist = cv2.remap(img, mapx, mapy, cv2.INTER_LINEAR)
x, y, w2, h2 = roi
undist_cropped = undist[y:y+h2, x:x+w2]
projectPoints and error check
imgpts, _ = cv2.projectPoints(objp, rvecs[0], tvecs[0], mtx, dist)
err = cv2.norm(imgpoints[0], imgpts.reshape(-1, 2), cv2.NORM_L2) / len(imgpts)
Average per-corner error for one image; aggregate over the dataset to judge calibration quality.
Save and load
np.savez("calib.npz", mtx=mtx, dist=dist, rms=rms)
D = np.load("calib.npz")
mtx, dist = D["mtx"], D["dist"]
Write YAML with OpenCV
fs = cv2.FileStorage("calib.yml", cv2.FILE_STORAGE_WRITE)
fs.write("camera_matrix", mtx)
fs.write("dist_coeffs", dist)
fs.release()
Practical tips
- Cover the full frame and multiple angles; avoid all images from the same pose.
- Use sharp prints; matte board reduces glare.
- For strong wide-angle / fisheye, consider
cv2.fisheyeroutines instead of the pinhole + Brown model.
Takeaways
Kanddistdescribe the camera;rvec, tvecper image place the board in camera coordinates.- Lower RMS → better fit; outliers usually mean bad corner detection.
- Use
remapwhen undistorting video streams.
Quick FAQ
Stereo vision
Depth from disparity
After rectification, corresponding points lie on the same scanline. If focal length is f (pixels) and baseline is B (same units as world), depth Z ≈ f · B / d where d is disparity in pixels. OpenCV’s Q matrix from stereoRectify encodes this relationship for reprojectImageTo3D.
Stereo calibration and rectification
Assume each camera is calibrated (K1, D1, K2, D2). Collect paired chessboard views; stereoCalibrate estimates R, T from camera 1 to camera 2, then stereoRectify builds rectifying transforms and Q.
import cv2
import numpy as np
# K1, D1, K2, D2 from mono calib; image_size = (w, h)
flags = cv2.CALIB_FIX_INTRINSIC
criteria = (cv2.TERM_CRITERIA_MAX_ITER + cv2.TERM_CRITERIA_EPS, 100, 1e-5)
rms, K1o, D1o, K2o, D2o, R, T, E, F = cv2.stereoCalibrate(
objpoints, imgpoints_l, imgpoints_r,
K1, D1, K2, D2, image_size, criteria=criteria, flags=flags)
R1, R2, P1, P2, Q, roi1, roi2 = cv2.stereoRectify(
K1o, D1o, K2o, D2o, image_size, R, T, alpha=0)
Python returns nine values: retval, K1, D1, K2, D2, R, T, E, F. With CALIB_FIX_INTRINSIC, K*/D* usually match inputs but must still be unpacked.
Build remap maps
map1x, map1y = cv2.initUndistortRectifyMap(K1o, D1o, R1, P1, image_size, cv2.CV_32FC1)
map2x, map2y = cv2.initUndistortRectifyMap(K2o, D2o, R2, P2, image_size, cv2.CV_32FC1)
left = cv2.imread("left.png", cv2.IMREAD_GRAYSCALE)
right = cv2.imread("right.png", cv2.IMREAD_GRAYSCALE)
left_r = cv2.remap(left, map1x, map1y, cv2.INTER_LINEAR)
right_r = cv2.remap(right, map2x, map2y, cv2.INTER_LINEAR)
alpha=0 crops valid pixels; alpha=1 keeps all pixels (may introduce invalid areas).
StereoBM (fast block matching)
stereo_bm = cv2.StereoBM_create(numDisparities=128, blockSize=15)
stereo_bm.setPreFilterCap(31)
stereo_bm.setTextureThreshold(10)
stereo_bm.setUniquenessRatio(15)
stereo_bm.setSpeckleWindowSize(100)
stereo_bm.setSpeckleRange(2)
disp_bm = stereo_bm.compute(left_r, right_r).astype(np.float32) / 16.0
StereoSGBM (quality presets)
# Preset A: balanced
sgbm_a = cv2.StereoSGBM_create(
minDisparity=0, numDisparities=128, blockSize=5,
P1=8 * 3 * 5**2, P2=32 * 3 * 5**2,
disp12MaxDiff=1, uniquenessRatio=10,
speckleWindowSize=100, speckleRange=2, mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY)
# Preset B: finer but slower (larger P1/P2 for 5x5 window)
win = 7
P1, P2 = 8 * 3 * win**2, 32 * 3 * win**2
sgbm_b = cv2.StereoSGBM_create(
minDisparity=0, numDisparities=256, blockSize=win,
P1=P1, P2=P2, uniquenessRatio=5, speckleWindowSize=150)
disp = sgbm_a.compute(left_r, right_r).astype(np.float32) / 16.0
numDisparities must be divisible by 16. Tune blockSize (odd): larger → smoother, less detail.
Optional: WLS filter (smoother disparity)
right_matcher = cv2.ximgproc.createRightMatcher(sgbm_a)
disp_left = sgbm_a.compute(left_r, right_r)
disp_right = right_matcher.compute(right_r, left_r)
wls = cv2.ximgproc.createDisparityWLSFilter(matcher_left=sgbm_a)
wls.setLambda(8000)
wls.setSigmaColor(1.5)
disp_wls = wls.filter(disp_left, left_r, None, disp_right)
Requires opencv-contrib module ximgproc.
Reproject to XYZ
points_3d = cv2.reprojectImageTo3D(disp, Q)
mask = disp > disp.min()
cloud = points_3d[mask] # Nx3 float32, coordinates in space of Q
Visualize disparity
disp_vis = cv2.normalize(disp, None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
disp_color = cv2.applyColorMap(disp_vis, cv2.COLORMAP_JET)
Takeaways
- Rectification is mandatory for standard row-aligned matchers.
- SGBM usually beats BM on thin structures; both need texture.
- Use
Q+ valid disparity mask for meaningful 3D points.
Quick FAQ
numDisparities clips true shifts.square_size in object points must match real board for metric scale.