The Perceptron
The perceptron is the simplest possible artificial neural network: a single artificial neuron that learns to classify inputs into two categories. It was invented by Frank Rosenblatt in 1958, implemented in hardware and demonstrated to the world with a press conference that generated headlines proclaiming machines would soon think. That enthusiasm proved premature, but the underlying idea proved foundational.
Anatomy of a Perceptron
A perceptron takes several numerical inputs and produces a single output.
Inputs (x): numerical values representing features of what you are trying to classify. If you are classifying whether an email is spam, inputs might include: number of exclamation marks, presence of the word "free", whether the sender is known and so on.
Weights (w): each input has an associated weight: a number that scales the importance of that input. A high positive weight means the input strongly pushes the output towards 1. A high negative weight means it pushes strongly towards 0. Weights start random and are adjusted during training.
Bias (b): an extra term added to the weighted sum that is not connected to any input. The bias allows the decision boundary to be shifted, giving the model more flexibility. Think of it as the threshold for firing, expressed in a form that is easier to learn.
Weighted sum: the perceptron computes:
z = (w₁ × x₁) + (w₂ × x₂) + ... + (wₙ × xₙ) + b
Step function: if z is greater than 0, the perceptron outputs 1. If z is 0 or less, it outputs 0. This is the decision.
Learning: The Perceptron Update Rule
A perceptron learns by looking at examples one at a time, comparing its prediction to the correct answer and adjusting its weights.
If the perceptron predicts correctly: no change.
If the perceptron predicts 0 but the correct answer is 1: increase the weights of the active inputs (and increase the bias). This makes the perceptron more likely to fire on similar inputs in the future.
If the perceptron predicts 1 but the correct answer is 0: decrease the weights of the active inputs (and decrease the bias).
The learning rate controls how large each adjustment is. Too large and the model oscillates. Too small and it learns too slowly.
Rosenblatt proved that if the training data is linearly separable (meaning you can draw a straight line, or a flat plane in higher dimensions, that perfectly separates the two classes) the perceptron will always find that line given enough iterations.
The XOR Problem
In 1969, Marvin Minsky and Seymour Papert published a book called Perceptrons that demonstrated a fundamental limitation: a single perceptron cannot solve the XOR (exclusive or) problem.
XOR takes two binary inputs and outputs 1 if exactly one of them is 1:
| Input A | Input B | XOR Output |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
If you plot these four points, there is no single straight line that separates the 1s from the 0s. The data is not linearly separable. A single perceptron, which can only learn a linear decision boundary, cannot solve it.
This finding effectively froze AI research funding for much of the 1970s. If a single neuron had such a basic limitation, how could neural networks ever be useful?
The Multi-Layer Solution
The answer was already available, but not widely appreciated: stack layers of perceptrons.
With two layers, a network can learn non-linear decision boundaries. The first layer transforms the input space. The second layer draws a linear boundary in the transformed space. The combination solves XOR trivially.
A network with:
- An input layer (receives the raw data)
- One or more hidden layers (transform the representation)
- An output layer (produces the final prediction)
...is called a multi-layer perceptron (MLP) or more generally a feedforward neural network.
The word "hidden" simply means the neurons in these layers are not directly observable as inputs or outputs: they are internal to the computation.
This architecture is the foundation of all of modern deep learning. The challenge of training multi-layer networks (how to adjust the weights of neurons that are not directly connected to the output) was not solved until the backpropagation algorithm was popularised in the 1980s. We cover that in Module 2.
Quiz: Why can a single perceptron not solve the XOR problem? What architectural change allows this limitation to be overcome?