Related Neural Networks Links
Learn Weight Initialization Neural Networks Tutorial, validate concepts with Weight Initialization Neural Networks MCQ Questions, and prepare interviews through Weight Initialization Neural Networks Interview Questions and Answers.
Neural Networks
15 Essential Q&A
Interview Prep
Weight Initialization — 15 Interview Questions
Xavier/Glorot, He, fan-in and fan-out, why symmetry must break, and how init interacts with activation and depth.
Colored left borders per card; green / amber / red difficulty chips.
Variance
Xavier
He
Random
1 Why can’t we set all weights to zero?Easy
Answer: Neurons in a layer stay identical: same outputs, same gradients, same updates—symmetry never breaks. Need random (or other asymmetric) init so units specialize.
2 How are biases usually initialized?Easy
Answer: Often zeros is fine for biases—symmetry breaking comes from weights. Sometimes small positive bias for ReLU to avoid dead neurons at start.
3 What are fan-in and fan-out?Easy
Answer: For one neuron, fan-in = number of incoming connections (input dim); fan-out = outgoing (next layer input count per filter/neuron context). Init schemes scale variance using these.
4 Xavier / Glorot initialization—idea.Medium
Answer: Choose weight variance so activation variance stays roughly stable forward and gradient variance backward—often Uniform or Normal with scale ∠1/fan_avg. Suited to tanh/sigmoid (linear-ish near 0).
Var(W) ≈ 2 / (fan_in + fan_out) (common form)
5 He initialization—for which activation?Medium
Answer: Designed for ReLU: roughly half of activations are zero, so variance is scaled with fan_in only (e.g. Var ≈ 2/fan_in for ReLU). Prevents signal dying or exploding early in deep ReLU nets.
6 What goes wrong with too large or too small random init?Medium
Answer: Too large: activations/gradients explode. Too small: activations vanish, gradients tiny—slow learning. Good init keeps scale in a reasonable band across layers.
7 LeCun normal / uniform—one line.Medium
Answer: Another fan-in-based scaling (e.g. std = 1/√fan_in) to preserve variance; similar family to Xavier/He with different constants for different assumptions.
8 Orthogonal initialization—when mentioned?Hard
Answer: Start with orthogonal weight matrices so singular values start near 1—helps very deep nets or RNNs with gradient flow. Less default than Xavier/He for vanilla CNN/MLP.
9 Transfer learning—how does “initialization†change?Easy
Answer: Load pretrained weights instead of random—only new head layers need fresh init. Fine-tuning uses small LR so pretrained init isn’t destroyed immediately.
10 Does batch norm make initialization less critical?Medium
Answer: Partly—BN stabilizes activations so training is less sensitive to exact scale. You still avoid pathological init; bad init can still hurt before BN statistics stabilize.
11 Residual blocks—init of last conv layer sometimes zero—why?Hard
Answer: Some designs initialize the last conv in a block near zero so the block starts as near-identity (skip path dominates), improving optimization of very deep nets.
12 What is a “gain†or activation-specific multiplier?Medium
Answer: Frameworks apply a constant (e.g. for Leaky ReLU slope) to adjust variance for that nonlinearity—He/Xavier formulas include these gains.
13 fan_in for a conv layer?Medium
Answer: Typically k_h × k_w × in_channels per filter—number of multiply-add inputs contributing to one output activation before bias.
14 Why set a random seed in experiments?Easy
Answer: Reproducibility—same init and shuffling for fair comparisons. Does not remove variance across seeds; best practice report multiple seeds for papers.
15 Default init you’d name in an interview?Easy
Answer: He for ReLU-based CNN/MLP; Xavier for tanh/sigmoid-heavy nets; use framework defaults (
kaiming_uniform, xavier_normal) and match to activation. Say fan_in, activation, and symmetry—three anchors interviewers expect.
Quick review checklist
- Zero weights; fan-in/fan-out; Xavier vs He.
- Too large/small random; BN interaction; pretrained as init.
- Conv fan_in; seed for reproducibility.