Seq2Seq Models
Tutorial
Seq2Seq Models
Understand the foundational Encoder-Decoder framework for translation.
Sequence-to-Sequence (Seq2Seq)
The Seq2Seq (Encoder-Decoder) architecture maps an input sequence (like an English sentence) to an output sequence of a completely different length (like a French sentence). It is the backbone of Machine Translation and Summarization.
The Two Components
- 1. Encoder: An RNN (usually an LSTM) that reads the input sequence step by step and compresses its entirety into a single fixed-size vector called the Context Vector.
- 2. Decoder: A second RNN that takes this Context Vector as its initial state, and generates the output sequence one token at a time until it produces an [END] token.
The Bottleneck Problem:
In a vanilla Seq2Seq model, all the information of a massive 50-word sentence must be squeezed into one fixed tiny vector before decoding. This "information bottleneck" degrades quality for long sentences. (The solution is Attention!)