Related Neural Networks Links
Learn Mlp Neural Networks Tutorial, validate concepts with Mlp Neural Networks MCQ Questions, and prepare interviews through Mlp Neural Networks Interview Questions and Answers.
Multi-Layer Perceptron (MLP)
An MLP stacks fully connected layers with nonlinear activations between them. One hidden layer is enough to solve problems like XOR that a single perceptron cannot. This page explains the structure, notation, and small forward-pass examples you can run.
hidden units dense layer feedforward XOR
What Is an MLP?
A multi-layer perceptron is a feedforward network: data flows from input → hidden layer(s) → output with no cycles. Each layer applies an affine map (linear transform + bias) followed by an activation σ (applied element-wise), except sometimes the output layer uses a different activation or none at all (e.g. raw logits for softmax cross-entropy).
The perceptron is a single linear threshold unit. An MLP adds hidden neurons so the model can represent non-convex decision regions built from piecewise-linear (ReLU) or smooth (sigmoid/tanh) building blocks.
Layer Equations
Let aâ½â°â¾ = x be the input. For layer l = 1 … L:
- zâ½Ë¡â¾ = Wâ½Ë¡â¾ aâ½Ë¡â»Â¹â¾ + bâ½Ë¡â¾ (matrix form: weights multiply activations from the previous layer)
- aâ½Ë¡â¾ = σâ½Ë¡â¾(zâ½Ë¡â¾) (element-wise; σ may differ per layer)
If σ is ReLU and shapes are Wâ½Ë¡â¾ ∈ â„dl×dl−1, then zâ½Ë¡â¾ has length dl. For a mini-batch of N examples, you stack rows so Aâ½Ë¡â»Â¹â¾ is N × dl−1 and use the same formulas with matrix multiplication.
Why Hidden Layers? XOR in One Diagram
No single line separates XOR in the original 2D input space. A small MLP can map inputs into a new space where XOR is linearly separable. A classic minimal pattern is 2 inputs → 2 hidden units (with nonlinearity) → 1 output.
Intuition: hidden units act as feature detectors (e.g. “both onâ€, “both offâ€, “exclusiveâ€). The last layer combines them with a linear decision.
You do not need to memorize XOR weights; modern training (gradient descent) discovers useful hidden representations from data. The important lesson: depth + nonlinearity unlocks functions a single linear boundary cannot express.
NumPy: Forward Pass Through a 2-Layer MLP
Toy architecture: 2 → 4 → 1 with ReLU in the hidden layer. We use random weights only to show shape flow; training would adjust W, b to fit data.
import numpy as np
def relu(z): return np.maximum(0, z)
rng = np.random.default_rng(7)
# batch N=3, input dim 2
X = rng.normal(0, 1, (3, 2))
W1 = rng.normal(0, 0.3, (2, 4))
b1 = np.zeros((1, 4))
W2 = rng.normal(0, 0.3, (4, 1))
b2 = np.zeros((1, 1))
z1 = X @ W1 + b1 # (3, 2) @ (2, 4) → (3, 4)
a1 = relu(z1)
z2 = a1 @ W2 + b2 # (3, 4) @ (4, 1) → (3, 1)
# For binary logits, z2 is enough; for probabilities add sigmoid in training setup
print("hidden shape:", a1.shape, "output shape:", z2.shape)
PyTorch: nn.Sequential
Frameworks package affine + activation as modules. A compact MLP for 10-dimensional inputs and 3 classes might look like this:
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Linear(10, 32),
nn.ReLU(),
nn.Linear(32, 32),
nn.ReLU(),
nn.Linear(32, 3), # logits for 3 classes → CrossEntropyLoss
)
x = torch.randn(16, 10) # batch 16
logits = model(x)
print(logits.shape) # torch.Size([16, 3])
Summary
- MLP = stacked dense layers + nonlinear activations (usually).
- Notation: z = Wa + b, then a' = σ(z), repeated.
- Hidden layers fix limitations of a single perceptron (e.g. XOR).
- Batching stacks examples as rows; matrix multiply handles all at once.