Computer Vision Chapter 19

CV Libraries & Frameworks

OpenCV, PyTorch torchvision, and TensorFlow/Keras vision APIs for production workflows.

OpenCV

Read, show, write

import cv2

img = cv2.imread("photo.jpg", cv2.IMREAD_COLOR)  # BGR, None if missing
if img is None:
    raise FileNotFoundError("photo.jpg")
cv2.imshow("win", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imwrite("copy.png", img)

Headless servers: avoid imshow; save files or use Jupyter matplotlib after BGR→RGB.

Arrays and color

import numpy as np

print(img.shape, img.dtype)  # (H, W, 3), uint8 typical
roi = img[100:200, 50:300]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

Main modules

  • core — matrices, drawing primitives.
  • imgproc — filters, morphology, contours, histograms.
  • features2d / calib3d — keypoints, homography, stereo.
  • videoio / video — capture, optical flow, tracking helpers.
  • dnn — run ONNX/Caffe/TF/Torch models.

Install notes

Do not install opencv-python and opencv-contrib-python in the same environment—they conflict. For CUDA-enabled builds you typically compile from source or use vendor-provided wheels.

Takeaways

  • BGR default; many ML stacks expect RGB—convert explicitly.
  • OpenCV excels at classical CV and deployment-friendly inference.
  • Pair with PyTorch/TensorFlow when you need training loops.

Quick FAQ

Wrong path, unicode path issues on Windows, or unsupported format—verify with os.path.exists.

PIL is convenient for simple transforms; OpenCV is faster for video, geometric ops, and DNN pre/post-processing in many workflows.

PyTorch Vision (torchvision)

ImageFolder + DataLoader

from torchvision.datasets import ImageFolder
from torchvision import transforms
from torch.utils.data import DataLoader

tf = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
ds = ImageFolder("data/train", transform=tf)
loader = DataLoader(ds, batch_size=32, shuffle=True, num_workers=4)

Directory layout: one subfolder per class name; ds.classes lists labels.

Models and Weights

import torch
from torchvision.models import resnet50, ResNet50_Weights

weights = ResNet50_Weights.IMAGENET1K_V2
model = resnet50(weights=weights).eval()
preprocess = weights.transforms()

# x: batch from preprocess(PIL) or equivalent
with torch.no_grad():
    logits = model(x)

Transforms v2 (brief)

Newer APIs under torchvision.transforms.v2 accept tensors or PIL and support bbox/mask transforms consistently—prefer them for detection/segmentation training when your version includes them.

Takeaways

  • Always match preprocessing to the weights you load.
  • Use torchvision.ops for NMS, ROI align, etc., in detection heads.
  • Pin torch and torchvision versions that are tested together.

Quick FAQ

Lower batch size, use mixed precision (torch.cuda.amp), gradient accumulation, or smaller models.

Pass collate_fn to DataLoader to pad variable-size images or merge dict targets for detection.

TensorFlow vision

Keras Applications

import tensorflow as tf

base = tf.keras.applications.ResNet50(
    include_top=True,
    weights="imagenet",
    input_shape=(224, 224, 3),
)
base.trainable = False
# Replace top for num_classes:
# x = base.output
# x = tf.keras.layers.Dense(num_classes, activation="softmax")(x)
# model = tf.keras.Model(base.input, x)

Use the matching preprocess_input from the same module as the architecture (ResNet vs EfficientNet differ).

tf.data from file paths

def load_decode(path, label):
    img = tf.io.read_file(path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, [224, 224])
    return img, label

paths_ds = tf.data.Dataset.from_tensor_slices((paths, labels))
ds = paths_ds.map(load_decode, num_parallel_calls=tf.data.AUTOTUNE)
ds = ds.batch(32).prefetch(tf.data.AUTOTUNE)

KerasCV (optional)

pip install keras-cv adds detection models, augmentations, and COCO-style metrics aligned with Keras 3 / multi-backend workflows—check the version matrix against your TensorFlow install.

Takeaways

  • Keep training and serving preprocessing identical when possible (layers or SavedModel signatures).
  • Mixed precision: tf.keras.mixed_precision.Policy("mixed_float16") on supported GPUs.
  • Compare with PyTorch for team skill fit and deployment targets (TF Lite, Edge TPU, JAX).

Quick FAQ

Keras on CPU/GPU defaults to NHWC (channels last) in TensorFlow; some ops support NCHW on GPU for performance.

Convert to TensorFlow Lite with quantization after representative dataset calibration for smaller on-device models.

Chapter FAQ

Quick FAQ

Wrong path, unicode path issues on Windows, or unsupported format—verify with os.path.exists.

PIL is convenient for simple transforms; OpenCV is faster for video, geometric ops, and DNN pre/post-processing in many workflows.

Quick FAQ

Lower batch size, use mixed precision (torch.cuda.amp), gradient accumulation, or smaller models.

Pass collate_fn to DataLoader to pad variable-size images or merge dict targets for detection.