GPT Models
Generative Pre-trained Transformers by OpenAI.
GPT Models
GPT (Generative Pre-trained Transformer) is the family of models from OpenAI that made AI mainstream. While BERT is for "understanding," GPT is built for "generating".
Level 1 — Autoregressive Generation
GPT models are Autoregressive. This means they predict the next word, then use that prediction to predict the next next word, and so on. It's essentially a very advanced "Autocomplete."
Level 2 — Decoder-only Architecture
Unlike BERT (Encoder-only) or T5 (Encoder-Decoder), GPT uses only the Decoder part of the Transformer. It uses "Masked Self-Attention" to ensure it only looks at the past and never cheats by looking at future words.
Level 3 — Zero-Shot and Few-Shot Learning
Starting with GPT-3, these models showed they could perform tasks (like translation) without seeing a single training example for that specific task, just by understanding the instruction. This is called Zero-Shot Learning.
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
prompt = "The future of NLP is"
result = generator(prompt, max_length=30, num_return_sequences=1)
print(result[0]['generated_text'])