Backpropagation deep dive 15 questions 25 min

Backpropagation MCQ · test your gradient flow knowledge

From chain rule to computational graphs – 15 questions covering the mechanics of backprop, vanishing gradients, and optimisation.

Easy: 5 Medium: 6 Hard: 4
Chain rule
Gradient descent
Computation graph
Vanishing gradient

Backpropagation: the engine of deep learning

Backpropagation, short for "backward propagation of errors," is the algorithm that computes gradients of the loss function with respect to every weight in a neural network. It applies the chain rule repeatedly, propagating error signals from the output layer back to the input. This MCQ tests your understanding of the chain rule, computational graphs, gradient flow issues, and practical aspects.

Why backprop matters

Without efficient gradient computation, training deep networks would be infeasible. Backprop leverages the chain rule to compute exact gradients in linear time relative to the forward pass.

Backprop glossary – key concepts

Chain rule

Calculus principle: derivative of composite function = product of derivatives. Backprop applies it from output to input.

Computational graph

Directed graph representing operations and dependencies. Each node gets a gradient during backward pass.

Vanishing gradient

Gradients become very small in early layers, slowing learning. Common with sigmoid/tanh.

Exploding gradient

Gradients grow exponentially, causing unstable updates. Gradient clipping helps.

Automatic differentiation

Framework that implements backprop (e.g., TensorFlow, PyTorch). Two modes: forward and reverse (backprop is reverse mode).

Gradient flow

How gradients propagate through layers; obstructed by saturation or bad activations.

# Conceptual backprop for a simple neuron (z = wx+b, a = σ(z), loss L)
# forward
z = w*x + b
a = sigmoid(z)
L = 0.5*(y - a)**2

# backward (reverse mode)
dL/da = a - y
da/dz = a*(1-a)      # sigmoid derivative
dL/dz = dL/da * da/dz
dL/dw = dL/dz * x
dL/db = dL/dz
Interview tip: Be ready to walk through a simple computational graph, explain why ReLU helps with vanishing gradients, and discuss gradient checking. This MCQ covers these foundational topics.

Common backprop interview questions

  • Explain the chain rule in the context of neural networks.
  • What is a computational graph and how is it used in backprop?
  • Why do sigmoid/tanh activations cause vanishing gradients?
  • How does backprop differ from forward-mode automatic differentiation?
  • What is gradient clipping and when is it used?
  • Describe the role of the loss function gradient in weight updates.