OpenCV
Read, show, write
import cv2
img = cv2.imread("photo.jpg", cv2.IMREAD_COLOR) # BGR, None if missing
if img is None:
raise FileNotFoundError("photo.jpg")
cv2.imshow("win", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imwrite("copy.png", img)
Headless servers: avoid imshow; save files or use Jupyter matplotlib after BGR→RGB.
Arrays and color
import numpy as np
print(img.shape, img.dtype) # (H, W, 3), uint8 typical
roi = img[100:200, 50:300]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
Main modules
- core — matrices, drawing primitives.
- imgproc — filters, morphology, contours, histograms.
- features2d / calib3d — keypoints, homography, stereo.
- videoio / video — capture, optical flow, tracking helpers.
- dnn — run ONNX/Caffe/TF/Torch models.
Install notes
Do not install opencv-python and opencv-contrib-python in the same environment—they conflict. For CUDA-enabled builds you typically compile from source or use vendor-provided wheels.
Takeaways
- BGR default; many ML stacks expect RGB—convert explicitly.
- OpenCV excels at classical CV and deployment-friendly inference.
- Pair with PyTorch/TensorFlow when you need training loops.
Quick FAQ
os.path.exists.PyTorch Vision (torchvision)
ImageFolder + DataLoader
from torchvision.datasets import ImageFolder
from torchvision import transforms
from torch.utils.data import DataLoader
tf = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
ds = ImageFolder("data/train", transform=tf)
loader = DataLoader(ds, batch_size=32, shuffle=True, num_workers=4)
Directory layout: one subfolder per class name; ds.classes lists labels.
Models and Weights
import torch
from torchvision.models import resnet50, ResNet50_Weights
weights = ResNet50_Weights.IMAGENET1K_V2
model = resnet50(weights=weights).eval()
preprocess = weights.transforms()
# x: batch from preprocess(PIL) or equivalent
with torch.no_grad():
logits = model(x)
Transforms v2 (brief)
Newer APIs under torchvision.transforms.v2 accept tensors or PIL and support bbox/mask transforms consistently—prefer them for detection/segmentation training when your version includes them.
Takeaways
- Always match preprocessing to the weights you load.
- Use
torchvision.opsfor NMS, ROI align, etc., in detection heads. - Pin
torchandtorchvisionversions that are tested together.
Quick FAQ
torch.cuda.amp), gradient accumulation, or smaller models.collate_fn to DataLoader to pad variable-size images or merge dict targets for detection.TensorFlow vision
Keras Applications
import tensorflow as tf
base = tf.keras.applications.ResNet50(
include_top=True,
weights="imagenet",
input_shape=(224, 224, 3),
)
base.trainable = False
# Replace top for num_classes:
# x = base.output
# x = tf.keras.layers.Dense(num_classes, activation="softmax")(x)
# model = tf.keras.Model(base.input, x)
Use the matching preprocess_input from the same module as the architecture (ResNet vs EfficientNet differ).
tf.data from file paths
def load_decode(path, label):
img = tf.io.read_file(path)
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.resize(img, [224, 224])
return img, label
paths_ds = tf.data.Dataset.from_tensor_slices((paths, labels))
ds = paths_ds.map(load_decode, num_parallel_calls=tf.data.AUTOTUNE)
ds = ds.batch(32).prefetch(tf.data.AUTOTUNE)
KerasCV (optional)
pip install keras-cv adds detection models, augmentations, and COCO-style metrics aligned with Keras 3 / multi-backend workflows—check the version matrix against your TensorFlow install.
Takeaways
- Keep training and serving preprocessing identical when possible (layers or SavedModel signatures).
- Mixed precision:
tf.keras.mixed_precision.Policy("mixed_float16")on supported GPUs. - Compare with PyTorch for team skill fit and deployment targets (TF Lite, Edge TPU, JAX).
Quick FAQ
Chapter FAQ
Quick FAQ
os.path.exists.Quick FAQ
torch.cuda.amp), gradient accumulation, or smaller models.collate_fn to DataLoader to pad variable-size images or merge dict targets for detection.