CIDEr Score
Tutorial
CIDEr Score
Consensus-based evaluation for image captioning.
CIDEr Score
CIDEr (Consensus-based Image Description Evaluation) is a metric designed specifically for Image Captioning tasks.
Level 1 — The Consensus Method
Instead of matching just one reference, CIDEr compares the machine caption to a "consensus" of 5 or more human captions. It measures how much the machine aligns with what most humans see in the image.
Level 2 — TF-IDF Weighting
CIDEr is smart: it uses TF-IDF to give more weight to rare, descriptive words (like "Dalmatian") and less weight to common words (like "dog"). If you get the rare words right, you get more points.
Level 3 — Handling Noise
By using the consensus of many humans, CIDEr is robust to the "noisy" nature of human descriptions, where different people might focus on different details of a photo.
CIDEr Concept (Pseudocode)
# CIDEr involves complex TF-IDF vector math over multiple references
# Higher CIDEr => Better alignment with human consensus
score = cider_scorer.compute_score(refs, res)
print(f"CIDEr score: {score}")