IoU for axis-aligned boxes
import numpy as np
def iou_box(a, b):
# a,b = [x1,y1,x2,y2]
x1 = max(a[0], b[0])
y1 = max(a[1], b[1])
x2 = min(a[2], b[2])
y2 = min(a[3], b[3])
inter = max(0, x2 - x1) * max(0, y2 - y1)
area_a = max(0, a[2] - a[0]) * max(0, a[3] - a[1])
area_b = max(0, b[2] - b[0]) * max(0, b[3] - b[1])
union = area_a + area_b - inter + 1e-6
return inter / union
A prediction is a “true positive” for a class if IoU ≥ threshold (e.g. 0.5) with a unmatched ground-truth of that class.
Precision, recall, F1
from sklearn.metrics import precision_recall_fscore_support
# y_true, y_pred: per-sample class indices
p, r, f1, _ = precision_recall_fscore_support(y_true, y_pred, average="macro")
# average="weighted" weights by support; "binary" for two-class
Segmentation IoU / Dice
For masks A, B ∈ {0,…,K-1}^H×W, per-class IoU is |A=k ∩ B=k| / |A=k ∪ B=k|. Dice = 2|A∩B| / (|A|+|B|) for binary or per-class. Report mean over classes excluding void if protocol requires.
mAP (detection)
Sort predictions by score; traverse thresholds to build precision–recall curve; AP is area under that curve (or interpolated variant). mAP averages AP over classes. COCO-style evaluation adds IoU 0.5:0.95, area splits, and caps on detections per image—use pycocotools or framework builtins for exact parity.
Takeaways
- Always document splits, preprocessing, and class definitions when reporting numbers.
- For deployment, also measure latency, memory, and failure cases.
- Statistical tests or confidence intervals help when differences are small.