Neural Networks Perceptron
Linear boundary NumPy demo

The Perceptron

The perceptron is the simplest trainable neural unit: a weighted sum of inputs plus a bias, passed through a step or sign function for binary decisions. Understanding it gives you the geometry of linear classification and the prototype “mistake-driven” learning rule that later generalizes to gradient descent in larger networks.

Rosenblatt linear classifier learning rule XOR MLP

What Is a Perceptron?

In 1957, Frank Rosenblatt described the perceptron as an algorithm and a simple device that could learn from examples. In modern terms, a (single-layer) perceptron for binary classification computes a linear score

z = w₁x₁ + w₂x₂ + … + wdxd + b = w·x + b

and outputs a class label using a threshold. With labels in {-1, +1}, a common convention is ŷ = sign(z) (with some tie-breaking rule at z = 0). With labels in {0, 1}, people often use a step: ŷ = 1 if z ≥ 0 else 0. The weights w and bias b are what we learn from data.

x₁ ──w₁──┐ x₂ ──w₂──┼──► Σ + b ──► activation (step/sign) ──► ŷ … │ xd ──wd──┘
Connection. If you replace the step with a sigmoid and train with log loss, you get logistic regression—still a linear model, but with smooth probabilities and gradients everywhere. The perceptron instead uses a hard threshold and a rule that only updates on mistakes.

Geometry: The Decision Boundary

The equation w·x + b = 0 defines a hyperplane in input space. Points on one side are classified as one class, points on the other as the second class. The vector w is normal (perpendicular) to that hyperplane; the bias b shifts the plane away from the origin.

In two dimensions, the boundary is a line. For example, if w = [1, -1] and b = 0, the line is x₁ - x₂ = 0 (i.e. x₁ = x₂). Points above/below that line get different labels depending on the sign of z.

Tiny numeric check

Let w = [2, -1], b = -1, and x = [1, 1]. Then z = 2(1) + (-1)(1) - 1 = 0. If we use ŷ = 1 when z ≥ 0, this point lies on the boundary. For x = [2, 0], z = 4 - 0 - 1 = 3 > 0 → one side of the boundary; for x = [0, 2], z = 0 - 2 - 1 = -3 → the other side.

Perceptron Learning Rule

Assume labels y ∈ {-1, +1} and prediction ŷ = sign(z) with z = w·x + b. The classic perceptron update runs only when the example is misclassified (y ≠ ŷ):

  • w ← w + η · y · x
  • b ← b + η · y

Here η > 0 is the learning rate. Intuition: if the true label is +1 but ŷ = -1, the score z was too low; adding a positive multiple of x to w tilts the hyperplane to increase z on that kind of input. If y = -1 and ŷ = +1, the update subtracts a multiple of x.

Perceptron convergence (informal)

If the data are linearly separable, this algorithm (with suitable η and cycling through examples) finds some separating hyperplane in finite steps. If the data are not linearly separable, updates never settle—the same mistakes recur.

Equivalent view with {0,1} labels

If you encode classes as 0/1, you can map to y ∈ {-1,+1} with y' = 2y - 1, run the rule, and map back—or write the update directly in terms of the error (target - prediction). Consistency of the rule with your chosen activation matters; stick to one convention per implementation.

NumPy: Train on AND and OR

Logical AND and OR on two binary inputs are linearly separable. The snippet below uses labels -1 and +1, prediction sign(z), and the perceptron update on mistakes. After enough epochs, weights should separate the points.

Perceptron training loop (AND)
import numpy as np

def sign(z):
    return np.where(z >= 0, 1, -1)

# AND: (+1 only when both inputs are +1)
# Bias input is implemented as constant 1 and weight w0 (bias b)
X = np.array([
    [-1, -1],
    [-1,  1],
    [ 1, -1],
    [ 1,  1],
], dtype=float)
y = np.array([-1, -1, -1, 1], dtype=float)  # AND with {-1, +1}

rng = np.random.default_rng(0)
w = rng.normal(0, 0.1, 2)
b = 0.0
eta = 0.5

for epoch in range(20):
    err = 0
    for xi, target in zip(X, y):
        z = np.dot(w, xi) + b
        pred = sign(z)
        if pred != target:
            err += 1
            w = w + eta * target * xi
            b = b + eta * target
    if err == 0:
        print(f"Converged epoch {epoch}")
        break

print("w =", w, "b =", b)
print("predictions:", [sign(np.dot(w, xi) + b) for xi in X])
Try OR yourself

For OR with {-1,+1}, targets should be [-1, 1, 1, 1] for the same input order. Swap y and rerun; the algorithm should again converge.

y_or = np.array([-1, 1, 1, 1], dtype=float)

Limitation: XOR Is Not Linearly Separable

The XOR function (exclusive or) outputs +1 when inputs differ and -1 when they are equal. In the 2D plane with corners at (-1,-1), (-1,1), (1,-1), (1,1), no single straight line separates the two classes. The perceptron cannot represent XOR with one linear threshold unit.

This is the famous limitation discussed by Minsky and Papert (1969): one layer of linear threshold units is weak unless you add hidden layers or nonlinear features. A multi-layer perceptron (MLP) with hidden units and nonlinear activations can learn XOR—our next tutorial topic extends the story from one neuron to a stack of layers.

Feature trick. You could map (x₁, x₂) to features like (x₁, x₂, x₁x₂) and then use a linear classifier in that lifted space—conceptually similar to what hidden layers do automatically.

PyTorch: Single Linear + Threshold (Illustration)

Modern frameworks rarely train with the discrete perceptron rule; they use continuous losses and autograd. For comparison, a single nn.Linear with two inputs and one output is exactly the z = w·x + b part; you would still need a step for a literal perceptron. The snippet shows only the linear part—training it with BCEWithLogitsLoss is closer to logistic regression than to the classical perceptron algorithm.

Linear layer = perceptron pre-activation
import torch
import torch.nn as nn

# One linear neuron: 2 features -> 1 logit
model = nn.Linear(2, 1, bias=True)
x = torch.tensor([[1.0, -1.0], [-1.0, 1.0]])
logits = model(x)
print("logits shape:", logits.shape)
print("weights:", model.weight.data)
print("bias:", model.bias.data)

Summary

  • The perceptron computes z = w·x + b and applies a hard threshold for binary labels.
  • Its decision boundary is a hyperplane; the algorithm moves that hyperplane when it makes mistakes.
  • It converges for linearly separable data; XOR motivates hidden layers (MLPs).
  • Logistic regression keeps linear geometry but uses smooth sigmoid + log loss—different training, similar inductive bias.

Next

Stack multiple neurons and layers with nonlinear activations to go beyond one hyperplane—the multi-layer perceptron (MLP).