Face recognition
Face detection: Haar cascade
import cv2
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
)
gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(40, 40))
for (x, y, w, h) in faces:
cv2.rectangle(img_bgr, (x, y), (x + w, y + h), (0, 255, 0), 2)
Fast and dependency-light; less robust than modern CNN detectors in hard lighting or pose.
DNN face detector (OpenCV)
Download Caffe or TensorFlow/OpenCV zoo models (e.g. single-shot detector variants). Load with cv2.dnn.readNetFromCaffe or readNetFromTensorflow, build a blob from the image, net.setInput, forward, then decode boxes and NMS.
net = cv2.dnn.readNetFromTensorflow("opencv_face_detector_uint8.pb",
"opencv_face_detector.pbtxt")
h, w = img_bgr.shape[:2]
blob = cv2.dnn.blobFromImage(img_bgr, 1.0, (300, 300), [104, 117, 123])
net.setInput(blob)
detections = net.forward()
# iterate detections[0,0,i,:] — confidence, box coords — apply threshold + NMS
Exact decoding depends on the model’s output layout; see OpenCV samples for the matching version.
Embeddings (concept)
Crop the face, resize to the network input (often 112×112), run the backbone + embedding head. L2-normalize vectors so cosine similarity equals dot product.
Verification: cosine similarity
import torch
import torch.nn.functional as F
def l2n(x):
return F.normalize(x, dim=1)
# e1, e2: [1, D] from your face encoder
sim = (l2n(e1) * l2n(e2)).sum(dim=1)
same_person = sim > 0.35 # threshold is model- and dataset-specific
Takeaways
- Detection quality limits end-to-end accuracy—align before encoding when possible.
- Use calibrated thresholds; report FAR/FRR for security-sensitive use.
- Privacy: biometrics need consent, secure storage, and compliance (e.g. GDPR).
Quick FAQ
Pose estimation
COCO-17 keypoints (idea)
Order typically includes nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles. Each predicted point has (x, y) and often a confidence; low confidence means occlusion or out-of-frame. Connect pairs with a fixed edge list to render a skeleton.
MediaPipe Pose (Python)
# pip install mediapipe opencv-python
import cv2
import mediapipe as mp
mp_pose = mp.solutions.pose
mp_draw = mp.solutions.drawing_utils
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
with mp_pose.Pose(static_image_mode=True) as pose:
res = pose.process(img_rgb)
if res.pose_landmarks:
mp_draw.draw_landmarks(
img_bgr, res.pose_landmarks, mp_pose.POSE_CONNECTIONS)
For video, set static_image_mode=False and reuse the same Pose instance across frames for smoother tracking.
OpenCV DNN (OpenPose-style)
OpenCV samples load Caffe/ONNX multi-branch models that output heatmaps and part affinity fields. You download the model files from the OpenCV GitHub wiki, run net.forward, then decode peaks and associate limbs—more code than MediaPipe but fully offline and customizable.
3D pose
Extends estimation to camera-centered 3D joint coordinates (monocular lifting, multi-view fusion, or depth sensors). Often couples with biomechanics or AR.
Takeaways
- Normalize crops and augment data for robustness to scale and clothing.
- Multi-person scenes need association (top-down boxes or bottom-up grouping).
- Ethics: pose in public spaces raises consent and surveillance concerns.