Computer Vision Chapter 18

OCR & Autonomous Driving

Optical character recognition and computer vision stacks for autonomous vehicles.

Optical character recognition

Preprocess for Tesseract

import cv2

gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (3, 3), 0)
_, th = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# Optional: morphology to close gaps in strokes
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
th = cv2.morphologyEx(th, cv2.MORPH_CLOSE, kernel)

Deskew, denoise, and contrast normalization often improve line-based OCR more than raw color photos.

pytesseract

# pip install pytesseract; install Tesseract OCR binary on PATH
import pytesseract
from PIL import Image

text = pytesseract.image_to_string(Image.fromarray(th), lang="eng")
data = pytesseract.image_to_data(Image.fromarray(th), output_type=pytesseract.Output.DICT)

image_to_data returns per-word boxes and confidences for debugging.

Scene text (idea)

OpenCV’s DNN module can run frozen EAST text detection or ONNX recognition models: produce quadrilaterals or axis-aligned boxes, warp crops to fixed height, then run a recognition network. Training custom data yields better domain accuracy than generic English-only models.

Takeaways

  • Match engine to layout: printed forms vs street signs vs handwriting.
  • Evaluate with character/word accuracy on a held-out set.
  • Privacy: redact or avoid storing sensitive text without policy.

Quick FAQ

Install the matching tessdata pack and pass lang="hin+eng" (example) to pytesseract.

Tesseract is limited; use specialized HTR models or line-level sequence models trained on IAM-style corpora.

Autonomous vehicles

Lane lines (classical sketch)

import cv2
import numpy as np

def region_of_interest(img, vertices):
    mask = np.zeros_like(img)
    cv2.fillPoly(mask, vertices, 255)
    return cv2.bitwise_and(img, mask)

gray = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
edges = cv2.Canny(blur, 50, 150)
h, w = edges.shape
roi = region_of_interest(edges, np.array([[(0, h), (w, h), (w * 0.55, h * 0.6), (w * 0.45, h * 0.6)]], np.int32))
lines = cv2.HoughLinesP(roi, 1, np.pi / 180, threshold=50, minLineLength=40, maxLineGap=150)

Perspective transform to bird’s-eye view plus curve fitting is common next step; learned lane nets dominate modern datasets.

Objects and freespace

2D detection (cars, pedestrians, signs) uses YOLO-style or two-stage detectors on camera frames. Segmentation labels each pixel (road, sky, vehicle) for dense scene understanding. Depth from stereo or LiDAR projection aligns detections in 3D for planning.

Fusion and time

Multiple sensors and time steps are combined with Kalman or particle filters, or learned trackers. Latency budgets and failure modes (glare, rain, occlusion) drive redundancy—not every scene is visible in one frame.

Takeaways

  • Treat perception outputs as uncertain; downstream planning should reason about risk.
  • Regulatory and ethical obligations exceed pure accuracy metrics.
  • Simulation + real-world ODD (operational design domain) testing are both essential.

Quick FAQ

Some prototypes map pixels to steering directly; most deployed stacks modularize perception, prediction, and control for interpretability and certification.

Robot Operating System is widely used in research for message passing between camera drivers, detectors, and planners.

Chapter FAQ

Quick FAQ

Install the matching tessdata pack and pass lang="hin+eng" (example) to pytesseract.

Tesseract is limited; use specialized HTR models or line-level sequence models trained on IAM-style corpora.

Quick FAQ

Some prototypes map pixels to steering directly; most deployed stacks modularize perception, prediction, and control for interpretability and certification.

Robot Operating System is widely used in research for message passing between camera drivers, detectors, and planners.