METEOR Score Tutorial

METEOR Score

A metric that improves on BLEU by allowing synonym and root-word matches.

Previous: ROUGE

METEOR Score

METEOR was created to fix the "rigidity" of BLEU. BLEU gives you 0 points if you use "happy" instead of "glad," but METEOR is smarter.

Level 1 — Flexible Matching

METEOR matches words in three stages:

Exact: Identical strings.
Stemming: Same root (walk vs walking).
Synonymy: Same meaning (big vs large).

Level 2 — Correlation with Humans

Because METEOR understands synonyms, it correlates much more strongly with human judgment than BLEU does. If a human thinks a translation is good, METEOR usually agrees.

Level 3 — Advanced Alignment

METEOR uses a sophisticated alignment algorithm to find the best mapping between the machine output and human reference, resulting in a more reliable score for quality.

METEOR in NLTK

from nltk.translate.meteor_score import meteor_score
import nltk

# NLTK requires wordnet for synonyms
nltk.download('wordnet')

reference = ["the quick brown fox"]
candidate = "a fast brown fox"
score = meteor_score([reference[0].split()], candidate.split())
print(f"METEOR score: {score}")

Previous: ROUGE