Related Neural Networks Links
Learn Forward Propagation Neural Networks Tutorial, validate concepts with Forward Propagation Neural Networks MCQ Questions, and prepare interviews through Forward Propagation Neural Networks Interview Questions and Answers.
Forward Propagation
Forward propagation (inference) is the process of passing input data through every layer—linear maps, biases, activations—to obtain predictions. There is no weight update. This page stresses the batched matrix view, shape checking, and how to run inference efficiently in PyTorch.
inference mini-batch tensor shapes no_grad
What Forward Propagation Does
Given fixed weights, forward propagation answers: “What output does this network produce for this input?†During training, you also forward-propagate to compute the loss, then backpropagate gradients. During deployment, you often only need the forward pass (sometimes with quantization or smaller models for speed).
Here Aâ½â°â¾ is the input matrix X with shape (N, dâ‚€). Weight matrix Wâ½Ë¡â¾ has shape (dl−1, dl) so that Aâ½Ë¡â»Â¹â¾Wâ½Ë¡â¾ is (N, dl). Bias bâ½Ë¡â¾ broadcasts across rows.
Shape Rules (Sanity Checks)
For dense layer: out = in @ W + b
- in: (N, d_in)
- W: (d_in, d_out)
- out: (N, d_out)
- b: (1, d_out) or (d_out,) with broadcasting
x.shape, w.shape, and expected matmul rule: inner dimensions must match (d_in).
NumPy: Full Forward Through a Small MLP
Same pattern as the MLP lesson, but explicit loop over layers for clarity: 784 → 128 → 64 → 10 (like a toy MNIST-style head).
import numpy as np
def relu(z): return np.maximum(0, z)
rng = np.random.default_rng(42)
N, d0, d1, d2, d3 = 32, 784, 128, 64, 10
X = rng.standard_normal((N, d0))
W1 = rng.normal(0, 0.05, (d0, d1))
b1 = np.zeros((1, d1))
W2 = rng.normal(0, 0.05, (d1, d2))
b2 = np.zeros((1, d2))
W3 = rng.normal(0, 0.05, (d2, d3))
b3 = np.zeros((1, d3))
a = X
a = relu(a @ W1 + b1)
a = relu(a @ W2 + b2)
logits = a @ W3 + b3 # (N, 10) — pass to softmax + CE in training
print("logits shape:", logits.shape)
PyTorch: eval(), torch.no_grad()
For inference you disable gradient tracking to save memory and compute. Also set model.eval() so layers like dropout and batch norm use inference behavior.
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
)
model.eval()
x = torch.randn(64, 784)
with torch.no_grad():
logits = model(x)
probs = torch.softmax(logits, dim=1)
print(probs.shape, probs[0].sum())
Complexity (Rough Intuition)
A dense layer’s matmul (N×d_in) @ (d_in×d_out) dominates cost: on the order of N × d_in × d_out multiply-adds. Deeper/wider nets multiply this per layer. Convolutions reuse weights over spatial positions and scale differently. For large models, mixed precision (FP16/BF16) and hardware (GPU/TPU) matter as much as algorithm choice.
Summary
- Forward propagation = apply layers in order with fixed parameters.
- Batches stack as rows; check matmul shapes at every layer.
- Use
eval()+torch.no_grad()for standard PyTorch inference. - Next in the track: define loss functions on top of these logits.
Next: Loss functions compare predictions to targets and drive learning.