Regularization deep dive 15 questions 25 min

Regularization MCQ · test your overfitting prevention skills

From L1/L2 to Dropout, Batch Norm, and Data Augmentation – 15 questions covering techniques to generalize better.

Easy: 5 Medium: 6 Hard: 4
L1/L2
Dropout
Batch Norm
Augmentation

Regularization: the art of preventing overfitting

Regularization encompasses techniques that reduce overfitting by adding constraints or noise during training. This MCQ test covers weight penalties (L1/L2), stochastic methods (Dropout), layer-wise normalization (Batch Norm), and data-level approaches (augmentation, early stopping).

Why regularize?

Overfitting occurs when a model learns noise instead of signal. Regularization encourages simpler, more generalizable patterns, improving performance on unseen data.

Regularization glossary – key concepts

L2 Regularization (Weight Decay)

Adds penalty λ∑w² to loss. Drives weights towards zero but not exactly zero; encourages weight distribution.

L1 Regularization (Lasso)

Adds penalty λ∑|w|. Can lead to sparse weights (some become exactly zero), useful for feature selection.

Dropout

Randomly drops neurons during training (with probability p). Prevents co-adaptation, acts like ensemble.

Batch Normalization

Normalizes layer inputs, adds learnable scale/shift. Reduces internal covariate shift and has slight regularization effect.

Data Augmentation

Generates modified training samples (rotation, flip, noise). Inexpensive way to increase dataset size.

Early Stopping

Stop training when validation performance degrades. Prevents overfitting by limiting iterations.

Stochastic Depth / DropConnect

Variants of Dropout: dropping entire layers (Stochastic Depth) or dropping individual weights (DropConnect).

# Common regularization implementations (PyTorch style)
# L2 regularization (via optimizer weight_decay)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)

# Dropout layer
self.dropout = nn.Dropout(p=0.5)   # randomly zeroes 50% of neurons

# BatchNorm
self.bn = nn.BatchNorm1d(hidden_dim)

# Data augmentation (torchvision)
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ToTensor()
])
Interview tip: Be ready to compare L1 vs L2, explain how Dropout approximates model averaging, and discuss why Batch Norm helps convergence. This MCQ covers these distinctions.

Common regularization interview questions

  • What is the difference between L1 and L2 regularization?
  • How does Dropout work during training vs. inference?
  • Why does Batch Normalization help with internal covariate shift?
  • Can data augmentation replace other regularization?
  • Explain the concept of early stopping and its trade-offs.
  • What is weight decay and how is it related to L2?