19.03 - Database Normalisation

GCSE Link: None

In any database, we want to avoid unnecessary data duplication at all costs. If we don't, it can lead to data redundancy (which can increase the amount of storage space required) and data inconsistency (where the same piece of data apparently has different values).

Database normalisation is the process of organising a database in order to eliminate data redundancy and inconsistency.

Let's take an example database and work through the steps of normalising it:

Orders

OrderID	OrderDate	Products	CustomerName	CustomerAddress
17342	15/07/2023	3 x Red T-Shirt, 2 x Blue Jeans	John Doe	153 Crown St, Reading
17343	21/07/2023	5 x Black Socks	Sarah Brown	26 West Rd, Bracknell
17344	23/07/2023	2 x Blue Jumper, 1 x White Trainers, 1 x Maroon Tracksuit	John Doe	153 Crown St, Reading
17345	01/08/2023	3 x Orange Cap	Richard Jones	36 Squirrels Way, Earley
17346	01/08/2023	1 x Green Hoodie	Richard Jones	36 Squirrels Way, Earley

The first thing we do is make sure all data is atomic (contains only one value). Here, the Products field is not atomic, so we must create another table for it. Also, the CustomerName and CustomerAddress fields can be decomposed further as well.

Orders

OrderID	OrderDate	CustomerForename	CustomerSurname	AddressNumber	AddressRoad	AddressTown
17342	15/07/2023	John	Doe	153	Crown St	Reading
17343	21/07/2023	Sarah	Brown	26	West Rd	Bracknell
17344	23/07/2023	John	Doe	153	Crown St	Reading
17345	01/08/2023	Richard	Jones	36	Squirrels Way	Earley
17346	01/08/2023	Richard	Jones	36	Squirrels Way	Earley

ProductInOrder

OrderID	ProductType	ProductColour	Quantity
17342	T-Shirt	Red	3
17342	Jeans	Blue	2
17343	Socks	Black	5
17344	Jumper	Blue	2
17344	Trainers	White	1
17344	Tracksuit	Maroon	1
17345	Cap	Orange	3
17346	Hoodie	Green	1

However, we still have a problem with possible data duplication in the Orders table, with all of the Customer fields. These belong in their own table.

Orders

OrderID	OrderDate	CustomerID
17342	15/07/2023	7086
17343	21/07/2023	6227
17344	23/07/2023	7086
17345	01/08/2023	7139
17346	01/08/2023	7139

ProductInOrder

OrderID	ProductType	ProductColour	Quantity
17342	T-Shirt	Red	3
17342	Jeans	Blue	2
17343	Socks	Black	5
17344	Jumper	Blue	2
17344	Trainers	White	1
17344	Tracksuit	Maroon	1
17345	Cap	Orange	3
17346	Hoodie	Green	1

Customer

CustomerID	Forename	Surname	AddressNumber	AddressRoad	AddressTown
6227	Sarah	Brown	26	West Rd	Bracknell
7086	John	Doe	153	Crown St	Reading
7139	Richard	Jones	36	Squirrels Way	Earley

We have one final problem. Ideally, the products would be in their own table as well.

Orders

OrderID	OrderDate	CustomerID
17342	15/07/2023	7086
17343	21/07/2023	6227
17344	23/07/2023	7086
17345	01/08/2023	7139
17346	01/08/2023	7139

ProductInOrder

OrderID	ProductID	Quantity
17342	23	3
17342	70	2
17343	81	5
17344	47	2
17344	88	1
17344	64	1
17345	95	3
17346	51	1

Customer

CustomerID	Forename	Surname	AddressNumber	AddressRoad	AddressTown
6227	Sarah	Brown	26	West Rd	Bracknell
7086	John	Doe	153	Crown St	Reading
7139	Richard	Jones	36	Squirrels Way	Earley

Product

ProductID	ProductType	Colour
23	T-Shirt	Red
47	Jumper	Blue
51	Hoodie	Green
64	Tracksuit	Maroon
70	Jeans	Blue
81	Socks	Black
88	Trainers	White
95	Cap	Orange

The database is now fully normalised. To verify this, we check if the following statement is true: "all non-key attributes in every table depend only upon the key, the whole key, and nothing but the key".

A data anomaly is an inconsistency in a database. How does normalisation help to avoid data anomalies?

If a database is normalised, each piece of data should be stored in only one place, and therefore only needs to be updated in one place if required. In denormalised databases, data needs to be updated using complex queries, which could lead to data anomalies if not done correctly.