Attention Mechanism
Learn how models learn to focus on specific parts of the input.
The Attention Mechanism
The Attention mechanism revolutionized NLP by allowing models to look over the entire input sequence dynamically at every step of generating an output, rather than relying on a single static context vector.
How Attention Solved the Bottleneck
During translation (e.g., from English to French), when the decoder wants to output the French word for "apple," it doesn't just look at the static context vector. Instead, it looks back at all the hidden states of the encoder, assigns a weight (attention score) to each English word, and heavily focuses its "attention" specifically on the English word "apple".
Alignment
Attention provides an automatic, implicit alignment between source and target languages without any explicit linguistic rules.
Infinite Context
Because it can look directly at the encoder states, performance no longer drops drastically on long sentences.