Seq2Seq Models Tutorial

Seq2Seq Models

Understand the foundational Encoder-Decoder framework for translation.

Sequence-to-Sequence (Seq2Seq)

The Seq2Seq (Encoder-Decoder) architecture maps an input sequence (like an English sentence) to an output sequence of a completely different length (like a French sentence). It is the backbone of Machine Translation and Summarization.

The Two Components

  • 1. Encoder: An RNN (usually an LSTM) that reads the input sequence step by step and compresses its entirety into a single fixed-size vector called the Context Vector.
  • 2. Decoder: A second RNN that takes this Context Vector as its initial state, and generates the output sequence one token at a time until it produces an [END] token.
The Bottleneck Problem: In a vanilla Seq2Seq model, all the information of a massive 50-word sentence must be squeezed into one fixed tiny vector before decoding. This "information bottleneck" degrades quality for long sentences. (The solution is Attention!)