COCO dataset: CV guide

Annotation structure (instances)

Top-level keys include images (id, file_name, height, width), annotations (image_id, category_id, bbox [x,y,w,h], area, segmentation, iscrowd), and categories (id, name, supercategory). iscrowd=1 marks RLE regions for groups; evaluation rules differ from single-object polygons.

pycocotools

# pip install pycocotools
from pycocotools.coco import COCO

ann_file = "annotations/instances_val2017.json"
coco = COCO(ann_file)
ids = coco.getImgIds(catIds=coco.getCatIds(catNms=["person"]))
img = coco.loadImgs(ids[0])[0]

Use coco.loadAnns / showAnns for visualization; detection eval uses cocoEval with predicted JSON in COCO result format.

Related tracks

Captions — image ↔ sentence pairs; metrics include BLEU, CIDEr, SPICE.
Keypoints — 17 body joints per person instance.
Panoptic — joint stuff + thing segmentation (separate challenge materials).

torchvision built-ins

from torchvision.datasets import CocoDetection

# root = image folder, annFile = instances_*.json
# ds = CocoDetection(root, annFile, transform=your_transform)

Pair with detection transforms (v2 APIs in recent torchvision) to return image + target dict.

                    Takeaways
                    Always match evaluation protocol (IoU range, max detections, area buckets) when comparing papers.
2017 split is the common modern reference; older 2014 still appears in legacy code.
Respect the COCO license and attribution when redistributing derived sets.

                

Quick FAQ

Historically a subset of val for faster iteration; definitions vary—state exactly which image IDs you use.

Open Images is larger and multi-label; evaluation tooling differs. Choose based on class vocabulary and annotation type.

Annotation structure (instances)

pycocotools

Related tracks

torchvision built-ins

Takeaways

Quick FAQ

minival?

COCO vs Open Images?