Backpropagation MCQ · test your gradient flow knowledge
From chain rule to computational graphs – 15 questions covering the mechanics of backprop, vanishing gradients, and optimisation.
Backpropagation: the engine of deep learning
Backpropagation, short for "backward propagation of errors," is the algorithm that computes gradients of the loss function with respect to every weight in a neural network. It applies the chain rule repeatedly, propagating error signals from the output layer back to the input. This MCQ tests your understanding of the chain rule, computational graphs, gradient flow issues, and practical aspects.
Why backprop matters
Without efficient gradient computation, training deep networks would be infeasible. Backprop leverages the chain rule to compute exact gradients in linear time relative to the forward pass.
Backprop glossary – key concepts
Chain rule
Calculus principle: derivative of composite function = product of derivatives. Backprop applies it from output to input.
Computational graph
Directed graph representing operations and dependencies. Each node gets a gradient during backward pass.
Vanishing gradient
Gradients become very small in early layers, slowing learning. Common with sigmoid/tanh.
Exploding gradient
Gradients grow exponentially, causing unstable updates. Gradient clipping helps.
Automatic differentiation
Framework that implements backprop (e.g., TensorFlow, PyTorch). Two modes: forward and reverse (backprop is reverse mode).
Gradient flow
How gradients propagate through layers; obstructed by saturation or bad activations.
# Conceptual backprop for a simple neuron (z = wx+b, a = σ(z), loss L) # forward z = w*x + b a = sigmoid(z) L = 0.5*(y - a)**2 # backward (reverse mode) dL/da = a - y da/dz = a*(1-a) # sigmoid derivative dL/dz = dL/da * da/dz dL/dw = dL/dz * x dL/db = dL/dz
Common backprop interview questions
- Explain the chain rule in the context of neural networks.
- What is a computational graph and how is it used in backprop?
- Why do sigmoid/tanh activations cause vanishing gradients?
- How does backprop differ from forward-mode automatic differentiation?
- What is gradient clipping and when is it used?
- Describe the role of the loss function gradient in weight updates.