Related Neural Networks Links
Learn Dropout Neural Networks Tutorial, validate concepts with Dropout Neural Networks MCQ Questions, and prepare interviews through Dropout Neural Networks Interview Questions and Answers.
Neural Networks
15 Essential Q&A
Interview Prep
Dropout & Regularization — 15 Interview Questions
Random neuron masks, inverted dropout, train vs eval, plus L1/L2 weight decay and how they differ from dropout.
Colored left borders per card; green / amber / red difficulty chips.
Dropout
L2
L1
Ensemble
1 What is dropout?Easy
Answer: During training, each activation is kept with probability 1−p and set to zero otherwise—different random mask each step. Reduces co-adaptation of neurons.
2 Dropout at training vs inference.Easy
Answer: Training: apply stochastic mask. Inference: no dropout—use full network. Expectation of output must match; handled by scaling (see inverted dropout or test-time multiply by 1−p).
3 What is inverted dropout?Medium
Answer: Scale kept activations by 1/(1−p) during training so inference needs no extra scaling. Common in frameworks—cleaner eval path.
4 Ensemble interpretation of dropout.Medium
Answer: Training samples many thinned subnets; inference averages over exponentially many such nets—approximated by using the full net with scaled weights. Explains regularizing effect.
5 Where is dropout usually applied?Easy
Answer: After fully connected or sometimes conv layers (less common in modern CNNs); often not on output layer. Transformers use attention dropout on weights/probs.
6 Typical dropout probability p?Easy
Answer: Hidden layers often 0.2–0.5; too high hurts capacity. Tune on validation; some architectures (BN-heavy nets) use less dropout.
7 L2 regularization (weight decay)—effect.Medium
Answer: Penalty λ||w||² encourages smaller weights, smoother functions, less overfitting. With SGD equivalent to shrinking weights each step; AdamW decouples decay properly.
8 L1 vs L2 for neural nets.Medium
Answer: L1 encourages sparsity (many exact zeros with subgradient methods). L2 shrinks all weights smoothly. L2 is default; L1 for feature selection or sparse models.
9 Monte Carlo dropout at test time—why?Hard
Answer: Leave dropout on during inference, average multiple forward passes—approximate predictive uncertainty (Bayesian NN heuristic).
10 Dropout with batch normalization—interaction?Hard
Answer: Order and strength matter; dropout before BN can shift batch statistics. Many modern vision models rely more on BN + data aug than heavy dropout—know it’s architecture-dependent.
11 Spatial dropout in CNNs.Medium
Answer: Drop entire feature maps (channels) instead of individual pixels—stronger structural regularization, avoids correlating adjacent activations.
12 Is label smoothing regularization?Medium
Answer: Yes—softens targets so the model doesn’t become overconfident; acts on the loss, not weights directly.
13 Gaussian noise on inputs as regularization.Easy
Answer: Adds robustness to small input perturbations—related to data augmentation and Tikhonov-style effects in linear models.
14 Stochastic depth / drop path (high level).Hard
Answer: Randomly skip whole residual branches during training—regularizes very deep networks similarly in spirit to dropout but on graph structure.
15 When prefer dropout vs weight decay?Medium
Answer: Often use both lightly. Dropout targets co-adaptation of activations; weight decay shrinks parameters. Large data + BN may need little dropout; small data FC nets benefit more.
State clearly: dropout off at eval unless doing MC dropout.
Quick review checklist
- Dropout train vs eval; inverted dropout; ensemble view.
- L1 vs L2; AdamW decoupling mention; spatial dropout.
- MC dropout; interaction with BN at high level.