GCSE Link: None

In any database, we want to avoid unnecessary data duplication at all costs. If we don't, it can lead to data redundancy (which can increase the amount of storage space required) and data inconsistency (where the same piece of data apparently has different values).

Database normalisation is the process of organising a database in order to eliminate data redundancy and inconsistency.

Let's take an example database and work through the steps of normalising it:

Orders

OrderID OrderDate Products CustomerName CustomerAddress
17342 15/07/2023 3 x Red T-Shirt, 2 x Blue Jeans John Doe 153 Crown St, Reading
17343 21/07/2023 5 x Black Socks Sarah Brown 26 West Rd, Bracknell
17344 23/07/2023 2 x Blue Jumper, 1 x White Trainers, 1 x Maroon Tracksuit John Doe 153 Crown St, Reading
17345 01/08/2023 3 x Orange Cap Richard Jones 36 Squirrels Way, Earley
17346 01/08/2023 1 x Green Hoodie Richard Jones 36 Squirrels Way, Earley

The first thing we do is make sure all data is atomic (contains only one value). Here, the Products field is not atomic, so we must create another table for it. Also, the CustomerName and CustomerAddress fields can be decomposed further as well.

Orders

OrderID OrderDate CustomerForename CustomerSurname AddressNumber AddressRoad AddressTown
17342 15/07/2023 John Doe 153 Crown St Reading
17343 21/07/2023 Sarah Brown 26 West Rd Bracknell
17344 23/07/2023 John Doe 153 Crown St Reading
17345 01/08/2023 Richard Jones 36 Squirrels Way Earley
17346 01/08/2023 Richard Jones 36 Squirrels Way Earley

ProductInOrder

OrderID ProductType ProductColour Quantity
17342 T-Shirt Red 3
17342 Jeans Blue 2
17343 Socks Black 5
17344 Jumper Blue 2
17344 Trainers White 1
17344 Tracksuit Maroon 1
17345 Cap Orange 3
17346 Hoodie Green 1

However, we still have a problem with possible data duplication in the Orders table, with all of the Customer fields. These belong in their own table.

Orders

OrderID OrderDate CustomerID
17342 15/07/2023 7086
17343 21/07/2023 6227
17344 23/07/2023 7086
17345 01/08/2023 7139
17346 01/08/2023 7139

ProductInOrder

OrderID ProductType ProductColour Quantity
17342 T-Shirt Red 3
17342 Jeans Blue 2
17343 Socks Black 5
17344 Jumper Blue 2
17344 Trainers White 1
17344 Tracksuit Maroon 1
17345 Cap Orange 3
17346 Hoodie Green 1

Customer

CustomerID Forename Surname AddressNumber AddressRoad AddressTown
6227 Sarah Brown 26 West Rd Bracknell
7086 John Doe 153 Crown St Reading
7139 Richard Jones 36 Squirrels Way Earley

We have one final problem. Ideally, the products would be in their own table as well.

Orders

OrderID OrderDate CustomerID
17342 15/07/2023 7086
17343 21/07/2023 6227
17344 23/07/2023 7086
17345 01/08/2023 7139
17346 01/08/2023 7139

ProductInOrder

OrderID ProductID Quantity
17342 23 3
17342 70 2
17343 81 5
17344 47 2
17344 88 1
17344 64 1
17345 95 3
17346 51 1

Customer

CustomerID Forename Surname AddressNumber AddressRoad AddressTown
6227 Sarah Brown 26 West Rd Bracknell
7086 John Doe 153 Crown St Reading
7139 Richard Jones 36 Squirrels Way Earley

Product

ProductID ProductType Colour
23 T-Shirt Red
47 Jumper Blue
51 Hoodie Green
64 Tracksuit Maroon
70 Jeans Blue
81 Socks Black
88 Trainers White
95 Cap Orange

The database is now fully normalised. To verify this, we check if the following statement is true: "all non-key attributes in every table depend only upon the key, the whole key, and nothing but the key".



A data anomaly is an inconsistency in a database. How does normalisation help to avoid data anomalies?

If a database is normalised, each piece of data should be stored in only one place, and therefore only needs to be updated in one place if required. In denormalised databases, data needs to be updated using complex queries, which could lead to data anomalies if not done correctly.