Core idea
The network outputs dense tensors encoding, for each spatial location (and scale): objectness or class scores, box coordinates (center, size, or distances to sides), and sometimes mask coefficients. Training matches predictions to ground-truth with IoU-based assignment and a multi-part loss (classification + localization + objectness). At inference, low-confidence predictions are filtered and non-maximum suppression removes overlaps.
Why fast?
Single backbone + detection head; highly optimized implementations (TensorRT, ONNX Runtime).
Trade-offs
Tiny models on hard scenes (crowds, small objects) may trail heavy two-stage detectors on mAP.
Version sketch
- YOLOv3 / v4 / v5 — multi-scale predictions, strong community adoption; v5/v8 ecosystems centered on Ultralytics tooling.
- YOLOv8 / YOLO11 (Ultralytics) — unified API for detect, segment, classify, pose; improved training pipeline and export.
- Papers also track YOLOv9/v10 etc.—check the exact paper/repo for architectural claims.
Ultralytics: predict on image / video
Install: pip install ultralytics (pulls PyTorch). Weights download on first use.
from ultralytics import YOLO
model = YOLO("yolov8n.pt") # nano — fastest; try s/m/l for accuracy
results = model.predict("https://ultralytics.com/images/bus.jpg", conf=0.25)
r = results[0]
if r.boxes is not None:
xyxys = r.boxes.xyxy.cpu().numpy()
clss = r.boxes.cls.cpu().numpy()
confs = r.boxes.conf.cpu().numpy()
for i in range(len(xyxys)):
x1, y1, x2, y2 = xyxys[i].tolist()
cls = int(clss[i])
conf = float(confs[i])
r.save("out.jpg") # annotated image
# r.show() # optional: requires a display
Batch and streaming
results = model.predict(["a.jpg", "b.jpg"], device=0, imgsz=640)
for seq in model.predict(source="video.mp4", stream=True):
pass # process each Results without holding all frames in RAM
Train and export (outline)
# Dataset: YOLO txt labels + yaml pointing to train/val images
model = YOLO("yolov8n.pt")
model.train(data="coco8.yaml", epochs=50, imgsz=640, batch=16)
model.export(format="onnx", opset=12) # deploy with ONNX Runtime / TensorRT
Replace coco8.yaml with your dataset YAML; validate paths and class count.
Speed tips
- Lower
imgsz(e.g. 416) for throughput; raise for small objects. - Use nano/small weights; quantize (INT8) after calibration on representative data.
- Enable half precision (
half=Trueon CUDA) when numerically stable. - For CPU-only, prefer ONNX Runtime with optimized graph or OpenVINO where available.
Takeaways
- YOLO = one-shot detector family optimized for latency.
- Ultralytics provides a practical training, validation, and export loop.
- Match conf / NMS settings to your precision–recall needs.