Attention Mechanism Tutorial

Attention Mechanism

Learn how models learn to focus on specific parts of the input.

The Attention Mechanism

The Attention mechanism revolutionized NLP by allowing models to look over the entire input sequence dynamically at every step of generating an output, rather than relying on a single static context vector.

How Attention Solved the Bottleneck

During translation (e.g., from English to French), when the decoder wants to output the French word for "apple," it doesn't just look at the static context vector. Instead, it looks back at all the hidden states of the encoder, assigns a weight (attention score) to each English word, and heavily focuses its "attention" specifically on the English word "apple".

Alignment

Attention provides an automatic, implicit alignment between source and target languages without any explicit linguistic rules.

Infinite Context

Because it can look directly at the encoder states, performance no longer drops drastically on long sentences.

Previous: Seq2Seq Models