METEOR Score
Tutorial
METEOR Score
A metric that improves on BLEU by allowing synonym and root-word matches.
METEOR Score
METEOR was created to fix the "rigidity" of BLEU. BLEU gives you 0 points if you use "happy" instead of "glad," but METEOR is smarter.
Level 1 — Flexible Matching
METEOR matches words in three stages:
- Exact: Identical strings.
- Stemming: Same root (walk vs walking).
- Synonymy: Same meaning (big vs large).
Level 2 — Correlation with Humans
Because METEOR understands synonyms, it correlates much more strongly with human judgment than BLEU does. If a human thinks a translation is good, METEOR usually agrees.
Level 3 — Advanced Alignment
METEOR uses a sophisticated alignment algorithm to find the best mapping between the machine output and human reference, resulting in a more reliable score for quality.
METEOR in NLTK
from nltk.translate.meteor_score import meteor_score
import nltk
# NLTK requires wordnet for synonyms
nltk.download('wordnet')
reference = ["the quick brown fox"]
candidate = "a fast brown fox"
score = meteor_score([reference[0].split()], candidate.split())
print(f"METEOR score: {score}")