Computer Vision Interview 20 essential Q&A Updated 2026
AlexNet

AlexNet: 20 Essential Q&A

The architecture that popularized deep CNNs on ImageNet—ReLU, dropout, and GPU scale.

~10 min read 20 questions Advanced
ImageNetReLUdropoutLRN
1 Why is AlexNet important? ⚡ easy
Answer: Won ImageNet 2012 by a large margin—showed deep CNNs + GPU + data could beat hand-crafted features, sparking the deep learning boom in vision.
2 What was ImageNet 2012? 📊 medium
Answer: 1.2M images, 1000 classes—AlexNet ~16% top-5 error vs previous ~26% with shallow methods—breakthrough result.
3 Rough architecture? 📊 medium
Answer: Five conv layers (some grouped across 2 GPUs) + max pooling + three large FC layers + softmax—deeper than prior CNNs for this task.
4 Why ReLU? 📊 medium
Answer: Faster training than saturating tanh/sigmoid; mitigates vanishing gradient in deep stacks; sparse activations.
5 Use of dropout? 📊 medium
Answer: Regularize huge FC layers by randomly zeroing neurons—reduces co-adaptation on training set.
6 What was LRN? 🔥 hard
Answer: Local response normalization—side inhibition across channels; later often replaced by batch norm; minor effect in hindsight.
7 Overlapping pooling? 📊 medium
Answer: Stride smaller than pool window—slightly richer downsampling vs non-overlapping; less common in newer nets.
8 Two GPUs? ⚡ easy
Answer: Model split across GPUs due to memory limits—cross-GPU connections only on certain layers (engineering constraint of the time).
9 Augmentation? 📊 medium
Answer: Random crops/flips from 256×256, PCA color jitter—reduces overfitting and increases effective data.
10 Parameters? ⚡ easy
Answer: On order of 60M—mostly FC layers; later architectures reduce FC params with GAP.
11 Training details? 📊 medium
Answer: SGD + momentum, weight decay, learning rate schedule dropping on plateaus—long schedule on two GPUs.
12 Overfitting risk? 📊 medium
Answer: Large capacity vs data—addressed by dropout, aug, and weight decay; still a concern for smaller datasets when fine-tuning.
13 vs VGG? 📊 medium
Answer: VGG uses uniform 3×3 stacks, deeper, more systematic—higher accuracy, more compute; AlexNet shallower irregular design.
14 vs ResNet? 📊 medium
Answer: ResNet adds residuals enabling much deeper nets—AlexNet depth modest by today’s standards.
15 Use AlexNet now? ⚡ easy
Answer: Mostly for teaching/history; ResNet/EfficientNet backbones dominate transfer learning—AlexNet too weak/slow vs modern alternatives.
16 Typical input? 📊 medium
Answer: 224×224 crops from 256×256 resized image—standard pipeline referenced in many papers.
17 Output layer? ⚡ easy
Answer: 1000-way softmax for ImageNet classes—cross-entropy loss during training.
18 Obsolete? ⚡ easy
Answer: For production accuracy, yes; for pedagogy and history, still the canonical “first big win” story.
19 Impact beyond vision? ⚡ easy
Answer: Validated deep learning at scale—influenced speech, NLP later wave; proved GPUs + data + depth recipe.
20 Modern small nets? 📊 medium
Answer: MobileNet, EfficientNet achieve better accuracy/FLOPs—mobile edge rarely uses AlexNet-sized FC heads.

AlexNet Cheat Sheet

Breakthrough
  • ImageNet 2012
Ideas
  • ReLU
  • Dropout
Today
  • Historical
  • Superseded

💡 Pro tip: Name ImageNet 2012 + ReLU + dropout + GPUs.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.