Hidden Markov Models (HMM)
Learn about sequence modeling using states, transitions, and emission probabilities in Hidden Markov Models.
Hidden Markov Models (HMMs)
Before Deep Learning, Hidden Markov Models (HMMs) were the absolute gold standard for NLP sequence tasks like Part-of-Speech tagging and Speech Recognition. HMMs are probabilistic graphical models.
The Concept of "Hidden" States
Imagine you are trying to predict the weather (Sunny or Rainy) based only on how your friend dresses (T-shirt or Coat) when they come inside.
- Observations (Visible): The words in a sentence. (e.g., "The", "dog", "runs"). Or your friend's clothes.
- Hidden States: The underlying labels we want to guess. (e.g., Determiner, Noun, Verb). Or the actual Weather.
The Two Probabilities of HMM
1. Transition Probabilities
The probability of moving from one Hidden State to another Hidden State.
P( Verb | Noun ) = 0.61
Meaning: If I just saw the word "The" (Determiner), there's an 80% chance the very next word will be a Noun.
2. Emission Probabilities
The probability of a Hidden State generating/emitting a specific Observation (Word).
P( "runs" | Verb ) = 0.02
Meaning: If the true underlying state is a Noun, there's a 5% chance the specific word written down is "dog".
The Viterbi Algorithm
If we give the HMM a sentence ("The dog runs"), how does it find the correct sequence of POS tags?
It uses Dynamic Programming via the Viterbi Algorithm. Viterbi calculates the most probable path of hidden states by multiplying the Transition and Emission probabilities together at every step, keeping track of the highest-scoring sequence through the network!