Normalisation: 1NF, 2NF and 3NF
Normalisation is the process of structuring a relational schema so that each fact is stored in exactly one place. It is the difference between a database that scales gracefully and one that develops increasingly painful update anomalies as it grows. This lesson covers the first three normal forms with worked examples. The next lesson covers BCNF and the equally important question of when to deliberately break the rules.
The starting point: an unnormalised table
Imagine a small online bookstore that started life as a single Orders spreadsheet. Each row is one line item from one order:
| order_id, customer_email, book_titles, quantities, total | ||
|---|---|---|
| 1001, thandi@example.com, "Disgrace,The Hobbit", "1,2", 700.00 | ||
| 1002, pieter@example.com, "Born a Crime", "1", 310.00 | ||
| 1003, thandi@example.com, "Disgrace,Born a Crime,Long Walk", "2,1,1", 1130.00 |
This shape is how spreadsheets look. It is not how databases should look. Three problems:
Normalisation is the systematic fix.
1NF: First Normal Form
A table is in 1NF when:
Fixing our bookstore: split the comma-separated columns out into rows.
Orders
+----------+--------------------+----------+
| order_id | customer_email | total |
+----------+--------------------+----------+
| 1001 | thandi@example.com | 700.00 |
| 1002 | pieter@example.com | 310.00 |
| 1003 | thandi@example.com | 1130.00 |
+----------+--------------------+----------+
OrderItems
+----------+--------------------+----------+-------+
| order_id | book_title | quantity | price |
+----------+--------------------+----------+-------+
| 1001 | Disgrace | 1 | 250 |
| 1001 | The Hobbit | 2 | 225 |
| 1002 | Born a Crime | 1 | 310 |
| 1003 | Disgrace | 2 | 250 |
| 1003 | Born a Crime | 1 | 310 |
| 1003 | Long Walk | 1 | 320 |
+----------+--------------------+----------+-------+
That is 1NF. Atomic cells, consistent types, unique rows (when you treat (order_id, book_title) as the composite key).
2NF: Second Normal Form
A table is in 2NF when it is in 1NF AND every non-key column depends on the WHOLE primary key, not just part of it.
This only matters when the primary key is composite (multiple columns).
Look at OrderItems again. The composite key is (order_id, book_title). The columns are quantity and price.
Fix: move price out into a Books table.
Books
+--------------+-------+
| book_title | price |
+--------------+-------+
| Disgrace | 250 |
| The Hobbit | 225 |
| Born a Crime | 310 |
| Long Walk | 320 |
+--------------+-------+
OrderItems
+----------+--------------+----------+
| order_id | book_title | quantity |
+----------+--------------+----------+
| 1001 | Disgrace | 1 |
| 1001 | The Hobbit | 2 |
| 1002 | Born a Crime | 1 |
| 1003 | Disgrace | 2 |
| 1003 | Born a Crime | 1 |
| 1003 | Long Walk | 1 |
+----------+--------------+----------+
Now OrderItems is in 2NF. Every non-key column (just quantity) depends on the whole composite key.
3NF: Third Normal Form
A table is in 3NF when it is in 2NF AND no non-key column depends on another non-key column. The technical phrase is "no transitive dependencies".
Look at Orders. The columns are customer_email and total. Suppose we add the customer's address, name and city to the Orders table directly:
Orders (not yet 3NF)
+----------+--------------------+--------------+--------------+--------+
| order_id | customer_email | customer_name| customer_city| total |
+----------+--------------------+--------------+--------------+--------+
| 1001 | thandi@example.com | Thandi Khoza | Cape Town | 700 |
| 1002 | pieter@example.com | Pieter de Wet| Johannesburg | 310 |
| 1003 | thandi@example.com | Thandi Khoza | Cape Town | 1130 |
+----------+--------------------+--------------+--------------+--------+
The order_id determines customer_email. Fine. But customer_email also determines customer_name and customer_city. So the dependency is order_id -> customer_email -> customer_name / city. Two steps. That is a transitive dependency.
The fix is the same idea as 2NF: pull the dependent columns into their own table.
Customers
+----------+--------------------+--------------+--------------+
| email | name | city | |
+----------+--------------------+--------------+--------------+
| thandi@..| Thandi Khoza | Cape Town | |
| pieter@..| Pieter de Wet | Johannesburg | |
+----------+--------------------+--------------+--------------+
Orders (3NF)
+----------+--------------------+--------+
| order_id | customer_email | total |
+----------+--------------------+--------+
| 1001 | thandi@example.com | 700 |
| 1002 | pieter@example.com | 310 |
| 1003 | thandi@example.com | 1130 |
+----------+--------------------+--------+
Now Orders is in 3NF. The only non-key column is total, which depends on order_id (the key) and nothing else.
A real worked example
Here is the bookstore after all three normal forms applied:
Customers (3NF)
- id (PK)
- email (UNIQUE)
- name
- city
Books (3NF)
- id (PK)
- title
- price
- category_id (FK -> Categories.id)
Categories (3NF)
- id (PK)
- name
Orders (3NF)
- id (PK)
- customer_id (FK -> Customers.id)
- placed_at
- status
OrderItems (3NF)
- order_id (FK -> Orders.id)
- book_id (FK -> Books.id)
- quantity
- price_at_purchase -- intentionally denormalised; see L2
- PK (order_id, book_id)
Five tables. Every fact about a customer lives in Customers. Every fact about a book lives in Books. Categories are a separate table because pulling them out makes adding a new category easy without touching Books schema.
Notice price_at_purchase in OrderItems is a deliberate denormalisation — the price is also in Books, but capturing it at purchase time means historical orders survive price changes. We cover when denormalisation is the right call in the next lesson.
Why aim for 3NF as the baseline
The trade-off coming next
3NF makes writes safe and storage small. It can make reads slower (more JOINs to reassemble a view) and harder to optimise for analytics. The next lesson covers BCNF (a stricter form) and the legitimate cases for denormalising back down.
- 1NF: atomic cells, single data type per column, unique rows. No comma-separated lists, no repeating groups.
- 2NF: every non-key column depends on the WHOLE composite key. No partial dependencies. Only matters when the key is composite.
- 3NF: no non-key column depends on another non-key column. No transitive dependencies.
- Bill Kent: "the key, the whole key, and nothing but the key". Single sentence that captures all three.
- 3NF is the standard baseline for OLTP schemas. Most production tables you will design or touch should be in 3NF unless there is a deliberate reason to deviate.