Regularization MCQ · test your overfitting prevention skills
From L1/L2 to Dropout, Batch Norm, and Data Augmentation – 15 questions covering techniques to generalize better.
Regularization: the art of preventing overfitting
Regularization encompasses techniques that reduce overfitting by adding constraints or noise during training. This MCQ test covers weight penalties (L1/L2), stochastic methods (Dropout), layer-wise normalization (Batch Norm), and data-level approaches (augmentation, early stopping).
Why regularize?
Overfitting occurs when a model learns noise instead of signal. Regularization encourages simpler, more generalizable patterns, improving performance on unseen data.
Regularization glossary – key concepts
L2 Regularization (Weight Decay)
Adds penalty λ∑w² to loss. Drives weights towards zero but not exactly zero; encourages weight distribution.
L1 Regularization (Lasso)
Adds penalty λ∑|w|. Can lead to sparse weights (some become exactly zero), useful for feature selection.
Dropout
Randomly drops neurons during training (with probability p). Prevents co-adaptation, acts like ensemble.
Batch Normalization
Normalizes layer inputs, adds learnable scale/shift. Reduces internal covariate shift and has slight regularization effect.
Data Augmentation
Generates modified training samples (rotation, flip, noise). Inexpensive way to increase dataset size.
Early Stopping
Stop training when validation performance degrades. Prevents overfitting by limiting iterations.
Stochastic Depth / DropConnect
Variants of Dropout: dropping entire layers (Stochastic Depth) or dropping individual weights (DropConnect).
# Common regularization implementations (PyTorch style)
# L2 regularization (via optimizer weight_decay)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
# Dropout layer
self.dropout = nn.Dropout(p=0.5) # randomly zeroes 50% of neurons
# BatchNorm
self.bn = nn.BatchNorm1d(hidden_dim)
# Data augmentation (torchvision)
transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ToTensor()
])
Common regularization interview questions
- What is the difference between L1 and L2 regularization?
- How does Dropout work during training vs. inference?
- Why does Batch Normalization help with internal covariate shift?
- Can data augmentation replace other regularization?
- Explain the concept of early stopping and its trade-offs.
- What is weight decay and how is it related to L2?