3.05 - ASCII and Unicode

When you press the keys on your keyboard, the keyboard sends binary information to the computer. How does the computer know what the binary information means?

A character set is a collection of characters that a computer can recognise from their binary representation.

Computers can therefore use this character set to convert between characters and binary numbers.

Modern computers use two main character sets: ASCII and Unicode.

The American Standard Code for Information Interchange (ASCII) is a character set which uses 7 bits to encode each character.

This means each character is assigned a binary number from 0000000₂ (0₁₀) to 1111111₂ (127₁₀), and therefore ASCII can encode a total of 128 different characters.

Table 1 shows some ASCII characters and their encodings in binary, hexadecimal, and decimal.

Table 1

CHAR	BIN	HEX	DEC
Space	`010` `0000₂`	`20₁₆`	`32₁₀`
0	`011` `0000₂`	`30₁₆`	`48₁₀`
1	`011` `0001₂`	`31₁₆`	`49₁₀`
9	`011` `1001₂`	`39₁₆`	`57₁₀`
A	`100` `0001₂`	`41₁₆`	`65₁₀`
B	`100` `0010₂`	`42₁₆`	`66₁₀`
Z	`101` `1010₂`	`5A₁₆`	`90₁₀`
a	`110` `0001₂`	`61₁₆`	`97₁₀`
b	`110` `0010₂`	`62₁₆`	`98₁₀`
z	`111` `1010₂`	`7A₁₆`	`122₁₀`

Some things to note about ASCII:
• The numbers (0 = 30₁₆ to 9 = 39₁₆), uppercase letters (A = 41₁₆ to Z = 5A₁₆), and lowercase letters (a = 61₁₆ to z = 7A₁₆) are all in order
• The characters before Space = 20₁₆ are all control characters (like Null = 00₁₆, Backspace = 08₁₆, and Escape = 1B₁₆)
• There are no letters from other alphabets, because there was no room (7 bits only allows for 128 different characters)

Unicode is a different character set which uses multiple bytes for each character.

Unicode can therefore encode many more characters than ASCII can: it covers every major language in the world. It can even encode emojis 🤯!

The first 128 characters in Unicode are the same as ASCII. This means that if a character is in ASCII, it will have the same encoding in Unicode.

What is a disadvantage of using Unicode instead of ASCII?

More storage will be required, as Unicode uses more bits per character than ASCII does.