CV Libraries & Frameworks

OpenCV

Read, show, write

import cv2

img = cv2.imread("photo.jpg", cv2.IMREAD_COLOR)  # BGR, None if missing
if img is None:
    raise FileNotFoundError("photo.jpg")
cv2.imshow("win", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imwrite("copy.png", img)

Headless servers: avoid imshow; save files or use Jupyter matplotlib after BGR→RGB.

Arrays and color

import numpy as np

print(img.shape, img.dtype)  # (H, W, 3), uint8 typical
roi = img[100:200, 50:300]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

Main modules

core — matrices, drawing primitives.
imgproc — filters, morphology, contours, histograms.
features2d / calib3d — keypoints, homography, stereo.
videoio / video — capture, optical flow, tracking helpers.
dnn — run ONNX/Caffe/TF/Torch models.

Install notes

Do not install opencv-python and opencv-contrib-python in the same environment—they conflict. For CUDA-enabled builds you typically compile from source or use vendor-provided wheels.

                    Takeaways
                    BGR default; many ML stacks expect RGB—convert explicitly.
OpenCV excels at classical CV and deployment-friendly inference.
Pair with PyTorch/TensorFlow when you need training loops.

                

Quick FAQ

Wrong path, unicode path issues on Windows, or unsupported format—verify with os.path.exists.

PIL is convenient for simple transforms; OpenCV is faster for video, geometric ops, and DNN pre/post-processing in many workflows.

PyTorch Vision (torchvision)

ImageFolder + DataLoader

from torchvision.datasets import ImageFolder
from torchvision import transforms
from torch.utils.data import DataLoader

tf = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
ds = ImageFolder("data/train", transform=tf)
loader = DataLoader(ds, batch_size=32, shuffle=True, num_workers=4)

Directory layout: one subfolder per class name; ds.classes lists labels.

Models and `Weights`

import torch
from torchvision.models import resnet50, ResNet50_Weights

weights = ResNet50_Weights.IMAGENET1K_V2
model = resnet50(weights=weights).eval()
preprocess = weights.transforms()

# x: batch from preprocess(PIL) or equivalent
with torch.no_grad():
    logits = model(x)

Transforms v2 (brief)

Newer APIs under torchvision.transforms.v2 accept tensors or PIL and support bbox/mask transforms consistently—prefer them for detection/segmentation training when your version includes them.

                    Takeaways
                    Always match preprocessing to the weights you load.
Use torchvision.ops for NMS, ROI align, etc., in detection heads.
Pin torch and torchvision versions that are tested together.

                

Quick FAQ

Lower batch size, use mixed precision (torch.cuda.amp), gradient accumulation, or smaller models.

Pass collate_fn to DataLoader to pad variable-size images or merge dict targets for detection.

TensorFlow vision

Keras Applications

import tensorflow as tf

base = tf.keras.applications.ResNet50(
    include_top=True,
    weights="imagenet",
    input_shape=(224, 224, 3),
)
base.trainable = False
# Replace top for num_classes:
# x = base.output
# x = tf.keras.layers.Dense(num_classes, activation="softmax")(x)
# model = tf.keras.Model(base.input, x)

Use the matching preprocess_input from the same module as the architecture (ResNet vs EfficientNet differ).

tf.data from file paths

def load_decode(path, label):
    img = tf.io.read_file(path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, [224, 224])
    return img, label

paths_ds = tf.data.Dataset.from_tensor_slices((paths, labels))
ds = paths_ds.map(load_decode, num_parallel_calls=tf.data.AUTOTUNE)
ds = ds.batch(32).prefetch(tf.data.AUTOTUNE)

KerasCV (optional)

pip install keras-cv adds detection models, augmentations, and COCO-style metrics aligned with Keras 3 / multi-backend workflows—check the version matrix against your TensorFlow install.

                    Takeaways
                    Keep training and serving preprocessing identical when possible (layers or SavedModel signatures).
Mixed precision: tf.keras.mixed_precision.Policy("mixed_float16") on supported GPUs.
Compare with PyTorch for team skill fit and deployment targets (TF Lite, Edge TPU, JAX).

                

Quick FAQ

Keras on CPU/GPU defaults to NHWC (channels last) in TensorFlow; some ops support NCHW on GPU for performance.

Convert to TensorFlow Lite with quantization after representative dataset calibration for smaller on-device models.

Chapter FAQ

Quick FAQ

Wrong path, unicode path issues on Windows, or unsupported format—verify with os.path.exists.

PIL is convenient for simple transforms; OpenCV is faster for video, geometric ops, and DNN pre/post-processing in many workflows.

Quick FAQ

Lower batch size, use mixed precision (torch.cuda.amp), gradient accumulation, or smaller models.

Pass collate_fn to DataLoader to pad variable-size images or merge dict targets for detection.

OpenCV

Read, show, write

Arrays and color

Main modules

Install notes

Takeaways

Quick FAQ

imread returns None?

OpenCV vs PIL?

PyTorch Vision (torchvision)

ImageFolder + DataLoader

Models and Weights

Transforms v2 (brief)

Takeaways

Quick FAQ

CUDA OOM?

Custom collate?

TensorFlow vision

Keras Applications

tf.data from file paths

KerasCV (optional)

Takeaways

Quick FAQ

Channels last vs first?

Export for mobile?

Chapter FAQ

Quick FAQ

imread returns None?

OpenCV vs PIL?

Quick FAQ

CUDA OOM?

Custom collate?

Models and `Weights`