Related Neural Networks Links
Learn Perceptron Neural Networks Tutorial, validate concepts with Perceptron Neural Networks MCQ Questions, and prepare interviews through Perceptron Neural Networks Interview Questions and Answers.
The Perceptron
The perceptron is the simplest trainable neural unit: a weighted sum of inputs plus a bias, passed through a step or sign function for binary decisions. Understanding it gives you the geometry of linear classification and the prototype “mistake-driven†learning rule that later generalizes to gradient descent in larger networks.
Rosenblatt linear classifier learning rule XOR MLP
What Is a Perceptron?
In 1957, Frank Rosenblatt described the perceptron as an algorithm and a simple device that could learn from examples. In modern terms, a (single-layer) perceptron for binary classification computes a linear score
z = wâ‚xâ‚ + wâ‚‚xâ‚‚ + … + wdxd + b = w·x + b
and outputs a class label using a threshold. With labels in {-1, +1}, a common convention is ŷ = sign(z) (with some tie-breaking rule at z = 0). With labels in {0, 1}, people often use a step: ŷ = 1 if z ≥ 0 else 0. The weights w and bias b are what we learn from data.
Geometry: The Decision Boundary
The equation w·x + b = 0 defines a hyperplane in input space. Points on one side are classified as one class, points on the other as the second class. The vector w is normal (perpendicular) to that hyperplane; the bias b shifts the plane away from the origin.
In two dimensions, the boundary is a line. For example, if w = [1, -1] and b = 0, the line is xâ‚ - xâ‚‚ = 0 (i.e. xâ‚ = xâ‚‚). Points above/below that line get different labels depending on the sign of z.
Tiny numeric check
Let w = [2, -1], b = -1, and x = [1, 1]. Then z = 2(1) + (-1)(1) - 1 = 0. If we use ŷ = 1 when z ≥ 0, this point lies on the boundary. For x = [2, 0], z = 4 - 0 - 1 = 3 > 0 → one side of the boundary; for x = [0, 2], z = 0 - 2 - 1 = -3 → the other side.
Perceptron Learning Rule
Assume labels y ∈ {-1, +1} and prediction ŷ = sign(z) with z = w·x + b. The classic perceptron update runs only when the example is misclassified (y ≠ŷ):
- w ↠w + η · y · x
- b ↠b + η · y
Here η > 0 is the learning rate. Intuition: if the true label is +1 but ŷ = -1, the score z was too low; adding a positive multiple of x to w tilts the hyperplane to increase z on that kind of input. If y = -1 and ŷ = +1, the update subtracts a multiple of x.
Perceptron convergence (informal)
If the data are linearly separable, this algorithm (with suitable η and cycling through examples) finds some separating hyperplane in finite steps. If the data are not linearly separable, updates never settle—the same mistakes recur.
Equivalent view with {0,1} labels
If you encode classes as 0/1, you can map to y ∈ {-1,+1} with y' = 2y - 1, run the rule, and map back—or write the update directly in terms of the error (target - prediction). Consistency of the rule with your chosen activation matters; stick to one convention per implementation.
NumPy: Train on AND and OR
Logical AND and OR on two binary inputs are linearly separable. The snippet below uses labels -1 and +1, prediction sign(z), and the perceptron update on mistakes. After enough epochs, weights should separate the points.
import numpy as np
def sign(z):
return np.where(z >= 0, 1, -1)
# AND: (+1 only when both inputs are +1)
# Bias input is implemented as constant 1 and weight w0 (bias b)
X = np.array([
[-1, -1],
[-1, 1],
[ 1, -1],
[ 1, 1],
], dtype=float)
y = np.array([-1, -1, -1, 1], dtype=float) # AND with {-1, +1}
rng = np.random.default_rng(0)
w = rng.normal(0, 0.1, 2)
b = 0.0
eta = 0.5
for epoch in range(20):
err = 0
for xi, target in zip(X, y):
z = np.dot(w, xi) + b
pred = sign(z)
if pred != target:
err += 1
w = w + eta * target * xi
b = b + eta * target
if err == 0:
print(f"Converged epoch {epoch}")
break
print("w =", w, "b =", b)
print("predictions:", [sign(np.dot(w, xi) + b) for xi in X])
Try OR yourself
For OR with {-1,+1}, targets should be [-1, 1, 1, 1] for the same input order. Swap y and rerun; the algorithm should again converge.
y_or = np.array([-1, 1, 1, 1], dtype=float)
Limitation: XOR Is Not Linearly Separable
The XOR function (exclusive or) outputs +1 when inputs differ and -1 when they are equal. In the 2D plane with corners at (-1,-1), (-1,1), (1,-1), (1,1), no single straight line separates the two classes. The perceptron cannot represent XOR with one linear threshold unit.
This is the famous limitation discussed by Minsky and Papert (1969): one layer of linear threshold units is weak unless you add hidden layers or nonlinear features. A multi-layer perceptron (MLP) with hidden units and nonlinear activations can learn XOR—our next tutorial topic extends the story from one neuron to a stack of layers.
PyTorch: Single Linear + Threshold (Illustration)
Modern frameworks rarely train with the discrete perceptron rule; they use continuous losses and autograd. For comparison, a single nn.Linear with two inputs and one output is exactly the z = w·x + b part; you would still need a step for a literal perceptron. The snippet shows only the linear part—training it with BCEWithLogitsLoss is closer to logistic regression than to the classical perceptron algorithm.
import torch
import torch.nn as nn
# One linear neuron: 2 features -> 1 logit
model = nn.Linear(2, 1, bias=True)
x = torch.tensor([[1.0, -1.0], [-1.0, 1.0]])
logits = model(x)
print("logits shape:", logits.shape)
print("weights:", model.weight.data)
print("bias:", model.bias.data)
Summary
- The perceptron computes z = w·x + b and applies a hard threshold for binary labels.
- Its decision boundary is a hyperplane; the algorithm moves that hyperplane when it makes mistakes.
- It converges for linearly separable data; XOR motivates hidden layers (MLPs).
- Logistic regression keeps linear geometry but uses smooth sigmoid + log loss—different training, similar inductive bias.
Next
Stack multiple neurons and layers with nonlinear activations to go beyond one hyperplane—the multi-layer perceptron (MLP).