Computer Vision Chapter 42

Autonomous vehicles

Road vehicles use cameras, LiDAR, radar, and maps. Perception estimates where drivable space is, where other agents are, and what traffic controls mean. Classical CV (color masks, Canny, Hough lines) can prototype lane cues; production stacks lean on learned detectors, semantic segmentation, and multi-sensor fusion with rigorous validation (simulation, closed courses, standards like ISO 26262). This page sketches educational building blocks—not a production AD stack.

Lane lines (classical sketch)

import cv2
import numpy as np

def region_of_interest(img, vertices):
    mask = np.zeros_like(img)
    cv2.fillPoly(mask, vertices, 255)
    return cv2.bitwise_and(img, mask)

gray = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
edges = cv2.Canny(blur, 50, 150)
h, w = edges.shape
roi = region_of_interest(edges, np.array([[(0, h), (w, h), (w * 0.55, h * 0.6), (w * 0.45, h * 0.6)]], np.int32))
lines = cv2.HoughLinesP(roi, 1, np.pi / 180, threshold=50, minLineLength=40, maxLineGap=150)

Perspective transform to bird’s-eye view plus curve fitting is common next step; learned lane nets dominate modern datasets.

Objects and freespace

2D detection (cars, pedestrians, signs) uses YOLO-style or two-stage detectors on camera frames. Segmentation labels each pixel (road, sky, vehicle) for dense scene understanding. Depth from stereo or LiDAR projection aligns detections in 3D for planning.

Fusion and time

Multiple sensors and time steps are combined with Kalman or particle filters, or learned trackers. Latency budgets and failure modes (glare, rain, occlusion) drive redundancy—not every scene is visible in one frame.

Takeaways

  • Treat perception outputs as uncertain; downstream planning should reason about risk.
  • Regulatory and ethical obligations exceed pure accuracy metrics.
  • Simulation + real-world ODD (operational design domain) testing are both essential.

Quick FAQ

Some prototypes map pixels to steering directly; most deployed stacks modularize perception, prediction, and control for interpretability and certification.

Robot Operating System is widely used in research for message passing between camera drivers, detectors, and planners.