The Perceptron

The perceptron is the simplest trainable neural unit: a weighted sum of inputs plus a bias, passed through a step or sign function for binary decisions. Understanding it gives you the geometry of linear classification and the prototype â€œmistake-drivenâ€ learning rule that later generalizes to gradient descent in larger networks.

Rosenblatt linear classifier learning rule XOR MLP

What Is a Perceptron?

In 1957, Frank Rosenblatt described the perceptron as an algorithm and a simple device that could learn from examples. In modern terms, a (single-layer) perceptron for binary classification computes a linear score

z = wâ‚xâ‚ + wâ‚‚xâ‚‚ + â€¦ + w_dx_d + b = wÂ·x + b

and outputs a class label using a threshold. With labels in {-1, +1}, a common convention is Å· = sign(z) (with some tie-breaking rule at z = 0). With labels in {0, 1}, people often use a step: Å· = 1 if z â‰¥ 0 else 0. The weights w and bias b are what we learn from data.

xâ‚ â”€â”€wâ‚â”€â”€â” xâ‚‚ â”€â”€wâ‚‚â”€â”€â”¼â”€â”€â–º Î£ + b â”€â”€â–º activation (step/sign) â”€â”€â–º Å· â€¦ â”‚ x_d â”€â”€w_dâ”€â”€â”˜

Connection. If you replace the step with a sigmoid and train with log loss, you get logistic regressionâ€”still a linear model, but with smooth probabilities and gradients everywhere. The perceptron instead uses a hard threshold and a rule that only updates on mistakes.

Geometry: The Decision Boundary

The equation wÂ·x + b = 0 defines a hyperplane in input space. Points on one side are classified as one class, points on the other as the second class. The vector w is normal (perpendicular) to that hyperplane; the bias b shifts the plane away from the origin.

In two dimensions, the boundary is a line. For example, if w = [1, -1] and b = 0, the line is xâ‚ - xâ‚‚ = 0 (i.e. xâ‚ = xâ‚‚). Points above/below that line get different labels depending on the sign of z.

Tiny numeric check

Let w = [2, -1], b = -1, and x = [1, 1]. Then z = 2(1) + (-1)(1) - 1 = 0. If we use Å· = 1 when z â‰¥ 0, this point lies on the boundary. For x = [2, 0], z = 4 - 0 - 1 = 3 > 0 â†’ one side of the boundary; for x = [0, 2], z = 0 - 2 - 1 = -3 â†’ the other side.

Perceptron Learning Rule

Assume labels y âˆˆ {-1, +1} and prediction Å· = sign(z) with z = wÂ·x + b. The classic perceptron update runs only when the example is misclassified (y â‰ Å·):

w â† w + Î· Â· y Â· x
b â† b + Î· Â· y

Here Î· > 0 is the learning rate. Intuition: if the true label is +1 but Å· = -1, the score z was too low; adding a positive multiple of x to w tilts the hyperplane to increase z on that kind of input. If y = -1 and Å· = +1, the update subtracts a multiple of x.

Perceptron convergence (informal)

If the data are linearly separable, this algorithm (with suitable Î· and cycling through examples) finds some separating hyperplane in finite steps. If the data are not linearly separable, updates never settleâ€”the same mistakes recur.

Equivalent view with {0,1} labels

If you encode classes as 0/1, you can map to y âˆˆ {-1,+1} with y' = 2y - 1, run the rule, and map backâ€”or write the update directly in terms of the error (target - prediction). Consistency of the rule with your chosen activation matters; stick to one convention per implementation.

NumPy: Train on AND and OR

Logical AND and OR on two binary inputs are linearly separable. The snippet below uses labels -1 and +1, prediction sign(z), and the perceptron update on mistakes. After enough epochs, weights should separate the points.

Perceptron training loop (AND)

import numpy as np

def sign(z):
    return np.where(z >= 0, 1, -1)

# AND: (+1 only when both inputs are +1)
# Bias input is implemented as constant 1 and weight w0 (bias b)
X = np.array([
    [-1, -1],
    [-1,  1],
    [ 1, -1],
    [ 1,  1],
], dtype=float)
y = np.array([-1, -1, -1, 1], dtype=float)  # AND with {-1, +1}

rng = np.random.default_rng(0)
w = rng.normal(0, 0.1, 2)
b = 0.0
eta = 0.5

for epoch in range(20):
    err = 0
    for xi, target in zip(X, y):
        z = np.dot(w, xi) + b
        pred = sign(z)
        if pred != target:
            err += 1
            w = w + eta * target * xi
            b = b + eta * target
    if err == 0:
        print(f"Converged epoch {epoch}")
        break

print("w =", w, "b =", b)
print("predictions:", [sign(np.dot(w, xi) + b) for xi in X])

Try OR yourself

For OR with {-1,+1}, targets should be [-1, 1, 1, 1] for the same input order. Swap y and rerun; the algorithm should again converge.

y_or = np.array([-1, 1, 1, 1], dtype=float)

Limitation: XOR Is Not Linearly Separable

The XOR function (exclusive or) outputs +1 when inputs differ and -1 when they are equal. In the 2D plane with corners at (-1,-1), (-1,1), (1,-1), (1,1), no single straight line separates the two classes. The perceptron cannot represent XOR with one linear threshold unit.

This is the famous limitation discussed by Minsky and Papert (1969): one layer of linear threshold units is weak unless you add hidden layers or nonlinear features. A multi-layer perceptron (MLP) with hidden units and nonlinear activations can learn XORâ€”our next tutorial topic extends the story from one neuron to a stack of layers.

Feature trick. You could map (xâ‚, xâ‚‚) to features like (xâ‚, xâ‚‚, xâ‚xâ‚‚) and then use a linear classifier in that lifted spaceâ€”conceptually similar to what hidden layers do automatically.

PyTorch: Single Linear + Threshold (Illustration)

Modern frameworks rarely train with the discrete perceptron rule; they use continuous losses and autograd. For comparison, a single nn.Linear with two inputs and one output is exactly the z = wÂ·x + b part; you would still need a step for a literal perceptron. The snippet shows only the linear partâ€”training it with BCEWithLogitsLoss is closer to logistic regression than to the classical perceptron algorithm.

Linear layer = perceptron pre-activation

import torch
import torch.nn as nn

# One linear neuron: 2 features -> 1 logit
model = nn.Linear(2, 1, bias=True)
x = torch.tensor([[1.0, -1.0], [-1.0, 1.0]])
logits = model(x)
print("logits shape:", logits.shape)
print("weights:", model.weight.data)
print("bias:", model.bias.data)

Summary

The perceptron computes z = wÂ·x + b and applies a hard threshold for binary labels.
Its decision boundary is a hyperplane; the algorithm moves that hyperplane when it makes mistakes.
It converges for linearly separable data; XOR motivates hidden layers (MLPs).
Logistic regression keeps linear geometry but uses smooth sigmoid + log lossâ€”different training, similar inductive bias.

Stack multiple neurons and layers with nonlinear activations to go beyond one hyperplaneâ€”the multi-layer perceptron (MLP).

Previous: NN intro Next: MLP

Related Neural Networks Links