CIDEr Score Tutorial

CIDEr Score

Consensus-based evaluation for image captioning.

CIDEr Score

CIDEr (Consensus-based Image Description Evaluation) is a metric designed specifically for Image Captioning tasks.

Level 1 — The Consensus Method

Instead of matching just one reference, CIDEr compares the machine caption to a "consensus" of 5 or more human captions. It measures how much the machine aligns with what most humans see in the image.

Level 2 — TF-IDF Weighting

CIDEr is smart: it uses TF-IDF to give more weight to rare, descriptive words (like "Dalmatian") and less weight to common words (like "dog"). If you get the rare words right, you get more points.

Level 3 — Handling Noise

By using the consensus of many humans, CIDEr is robust to the "noisy" nature of human descriptions, where different people might focus on different details of a photo.

CIDEr Concept (Pseudocode)
# CIDEr involves complex TF-IDF vector math over multiple references
# Higher CIDEr => Better alignment with human consensus
score = cider_scorer.compute_score(refs, res)
print(f"CIDEr score: {score}")