Computer Vision Chapter 46

CV evaluation metrics

Choosing metrics that match the task avoids misleading comparisons. Classification uses accuracy, precision/recall, F1, ROC-AUC, and confusion matrices (watch class imbalance). Object detection pairs predicted and ground-truth boxes with IoU thresholds, then averages precision–recall into mAP (COCO uses multiple IoUs and max detections). Segmentation often reports mean IoU or Dice per class. Below: IoU in NumPy and a classification F1 sketch.

IoU for axis-aligned boxes

import numpy as np

def iou_box(a, b):
    # a,b = [x1,y1,x2,y2]
    x1 = max(a[0], b[0])
    y1 = max(a[1], b[1])
    x2 = min(a[2], b[2])
    y2 = min(a[3], b[3])
    inter = max(0, x2 - x1) * max(0, y2 - y1)
    area_a = max(0, a[2] - a[0]) * max(0, a[3] - a[1])
    area_b = max(0, b[2] - b[0]) * max(0, b[3] - b[1])
    union = area_a + area_b - inter + 1e-6
    return inter / union

A prediction is a “true positive” for a class if IoU ≥ threshold (e.g. 0.5) with a unmatched ground-truth of that class.

Precision, recall, F1

from sklearn.metrics import precision_recall_fscore_support

# y_true, y_pred: per-sample class indices
p, r, f1, _ = precision_recall_fscore_support(y_true, y_pred, average="macro")
# average="weighted" weights by support; "binary" for two-class

Segmentation IoU / Dice

For masks A, B ∈ {0,…,K-1}^H×W, per-class IoU is |A=k ∩ B=k| / |A=k ∪ B=k|. Dice = 2|A∩B| / (|A|+|B|) for binary or per-class. Report mean over classes excluding void if protocol requires.

mAP (detection)

Sort predictions by score; traverse thresholds to build precision–recall curve; AP is area under that curve (or interpolated variant). mAP averages AP over classes. COCO-style evaluation adds IoU 0.5:0.95, area splits, and caps on detections per image—use pycocotools or framework builtins for exact parity.

Takeaways

  • Always document splits, preprocessing, and class definitions when reporting numbers.
  • For deployment, also measure latency, memory, and failure cases.
  • Statistical tests or confidence intervals help when differences are small.

Quick FAQ

Images contain variable numbers of objects; mAP summarizes localization and classification jointly at multiple score cutoffs.

Averaging predictions over flipped/scaled views can boost metrics; declare TTA in papers and watch inference cost.