Annotation structure (instances)
Top-level keys include images (id, file_name, height, width), annotations (image_id, category_id, bbox [x,y,w,h], area, segmentation, iscrowd), and categories (id, name, supercategory). iscrowd=1 marks RLE regions for groups; evaluation rules differ from single-object polygons.
pycocotools
# pip install pycocotools
from pycocotools.coco import COCO
ann_file = "annotations/instances_val2017.json"
coco = COCO(ann_file)
ids = coco.getImgIds(catIds=coco.getCatIds(catNms=["person"]))
img = coco.loadImgs(ids[0])[0]
Use coco.loadAnns / showAnns for visualization; detection eval uses cocoEval with predicted JSON in COCO result format.
Related tracks
- Captions — image ↔ sentence pairs; metrics include BLEU, CIDEr, SPICE.
- Keypoints — 17 body joints per person instance.
- Panoptic — joint stuff + thing segmentation (separate challenge materials).
torchvision built-ins
from torchvision.datasets import CocoDetection
# root = image folder, annFile = instances_*.json
# ds = CocoDetection(root, annFile, transform=your_transform)
Pair with detection transforms (v2 APIs in recent torchvision) to return image + target dict.
Takeaways
- Always match evaluation protocol (IoU range, max detections, area buckets) when comparing papers.
- 2017 split is the common modern reference; older 2014 still appears in legacy code.
- Respect the COCO license and attribution when redistributing derived sets.