Forward Propagation â€” 15 Interview Questions

From input to logits: layer order, tensor shapes, batching, inference vs training mode, and how interviewers test your mental model of the forward pass.

Colored left borders per card; green / amber / red difficulty chips.

Inference Shapes FLOPs Activations

1 What is forward propagation?Easy

Answer: Computing the networkâ€™s output from input by applying each layer in order: affine transforms, biases, activations, pooling, etc.â€”no weight updates. It is prediction / loss input during training and pure inference at deploy time.

2 Forward vs backward pass in one sentence each.Easy

Answer: Forward: compute outputs and (usually) cache intermediates for loss. Backward: apply chain rule to get gradients for learning. Forward does not change weights; backward supplies the update signal.

3 One step of an MLP layer in forward form.Easy

Answer: z = Wx + b, then a = f(z) for activation f. For a batch, X is stacked rows and the same W applies to each.

z = Wx + b, a = f(z)

4 Shape of X, W, and output for a batched linear layer.Medium

Answer: X: B Ã— d_in, W: d_in Ã— d_out, bias b: d_out (broadcast). Output Y: B Ã— d_out with Y = XW + b (row-wise).

5 Rough FLOPs for matrix multiply A (mÃ—k) Â· B (kÃ—n)?Medium

Answer: Dominant term is O(mÂ·kÂ·n) multiply-adds (often quoted as ~2mkn FLOPs if counting mul+add separately). Used to reason about layer cost in forward pass.

6 Why must layers be applied in a fixed order?Easy

Answer: Each layerâ€™s input is the previous layerâ€™s output. Reordering changes the composed function entirely unless the architecture is specially designed (e.g. parallel branches with merge).

7 What activations are often cached during forward pass in training?Medium

Answer: Pre-activations z and post-activations a (and BN stats inputs) so backprop can compute local gradients without recomputing everything. Frameworks handle this in autograd.

8 How does eval() / inference mode change forward behavior?Medium

Answer: Dropout disabled (or scaled). BatchNorm uses running mean/var not batch stats. No gradient tracking neededâ€”saves memory and compute.

9 Why subtract max before softmax in practice?Hard

Answer: Logits can be large; e^z overflows. z' = z âˆ’ max(z) shifts logits without changing softmax output (invariant) but keeps exponentials boundedâ€”numerical stability.

10 Forward pass for batch size 1 vs large Bâ€”same code path?Easy

Answer: Usually yesâ€”B=1 is a degenerate batch; matrix ops still work. Some ops (e.g. BN) behave differently with tiny batch size; thatâ€™s a practical caveat.

11 What drives memory during forward (training)?Medium

Answer: Storing activations for backprop, plus optimizer state if updating. Wider/deeper nets and larger batch increase activation memoryâ€”often the bottleneck before weights.

12 â€œFunctionalâ€ forward: what does it mean in frameworks?Medium

Answer: Applying ops with explicit weight tensors passed in (e.g. F.linear(x, W, b)) instead of nn.Module parametersâ€”same math, useful for meta-learning or custom graphs.

13 Mixed precision forwardâ€”what changes?Hard

Answer: Many ops run in float16/bfloat16 for speed; sensitive reductions (loss, BN) may stay in float32. Loss scaling can help with small gradients in low precision.

14 Exported model â€œinference graphâ€â€”relation to forward pass?Medium

Answer: It is a frozen forward computation graph (no backward), optimized for deploymentâ€”same layer order as training forward, possibly fused ops.

15 Walk through a 3-layer MLP forward from x to class probs.Easy

Answer: x â†’ h1 = f(W1x+b1) â†’ h2 = f(W2h1+b2) â†’ logits = W3h2+b3 â†’ probs = softmax(logits). Mention where nonlinearity stops (before softmax).

Draw arrows on a whiteboardâ€”interviewers check you separate linear blocks from f and softmax.

Quick review checklist

Define forward vs backward; one MLP layer: z, a, batch shapes.
Training caches; eval mode: dropout off, BN running stats.
Softmax stability; FLOPs order for matmul; memory = activations.

Previous: Activations Next: Loss functions

Related Neural Networks Links

Forward Propagation â€” 15 Interview Questions

Quick review checklist