Localization vs mapping
Localization: given a map, where am I? Mapping: given poses, what does the world look like? SLAM couples both because each helps the other: a better map improves pose; better poses improve the map. Without global constraints, integrating only relative motion drifts—loop closure detects revisits and distributes error.
Visual odometry (VO)
Frame-to-frame motion; subset of SLAM without global optimization or loop closure—can be a module inside SLAM.
Sensor fusion
IMU + camera (VIO), LiDAR + camera—filters (EKF) or factor graphs fuse predictions.
Typical visual SLAM pipeline
- Extract and match features (ORB, FAST+BRIEF, …).
- Estimate motion between frames (essential/fundamental matrix, PnP with map points).
- Triangulate new 3D landmarks; local bundle adjustment.
- Detect loops; optimize pose graph or global BA.
Example: match two frames
import cv2
import numpy as np
g1 = cv2.imread("frame0.jpg", cv2.IMREAD_GRAYSCALE)
g2 = cv2.imread("frame1.jpg", cv2.IMREAD_GRAYSCALE)
orb = cv2.ORB_create(1000)
k1, d1 = orb.detectAndCompute(g1, None)
k2, d2 = orb.detectAndCompute(g2, None)
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(d1, d2)
matches = sorted(matches, key=lambda m: m.distance)[:200]
pts1 = np.float32([k1[m.queryIdx].pt for m in matches])
pts2 = np.float32([k2[m.trainIdx].pt for m in matches])
Essential matrix and recoverPose
With calibrated intrinsics K, the essential matrix E relates normalized correspondences. recoverPose yields R, t (up to scale for monocular).
K = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]], dtype=np.float64)
E, mask = cv2.findEssentialMat(pts1, pts2, K, method=cv2.RANSAC, prob=0.999, threshold=1.0)
in1 = pts1[mask.ravel() == 1]
in2 = pts2[mask.ravel() == 1]
_, R, t, _ = cv2.recoverPose(E, in1, in2, K)
# R,t: motion from frame1 coords to frame2 (convention: verify OpenCV docs)
Fundamental matrix (uncalibrated)
F, mask = cv2.findFundamentalMat(pts1, pts2, cv2.FM_RANSAC, 1.0, 0.99)
Frameworks to explore
- ORB-SLAM3 — feature-based, mono/stereo/RGB-D, loop closing.
- OpenVINS, VINS-Mono — tightly coupled VIO.
- RTAB-Map — RGB-D mapping with ROS integration.
Takeaways
- SLAM = online pose + map; drift needs loop closure or global sensors.
- Two-view geometry (
E,recoverPose) is a VO building block. - Production SLAM uses factor graphs / BA—not just pairwise estimates.