When you press the keys on your keyboard, the keyboard sends binary information to the computer. How does the computer know what the binary information means?

A character set is a collection of characters that a computer can recognise from their binary representation.

Computers can therefore use this character set to convert between characters and binary numbers.

Modern computers use two main character sets: ASCII and Unicode.

The American Standard Code for Information Interchange (ASCII) is a character set which uses 7 bits to encode each character.

This means each character is assigned a binary number from 00000002 (010) to 11111112 (12710), and therefore ASCII can encode a total of 128 different characters.

Table 1 shows some ASCII characters and their encodings in binary, hexadecimal, and decimal.

Table 1

CHAR BIN HEX DEC
Space 01000002 2016 3210
0 01100002 3016 4810
1 01100012 3116 4910
9 01110012 3916 5710
A 10000012 4116 6510
B 10000102 4216 6610
Z 10110102 5A16 9010
a 11000012 6116 9710
b 11000102 6216 9810
z 11110102 7A16 12210


Some things to note about ASCII:
• The numbers (0 = 3016 to 9 = 3916), uppercase letters (A = 4116 to Z = 5A16), and lowercase letters (a = 6116 to z = 7A16) are all in order
• The characters before Space = 2016 are all control characters (like Null = 0016, Backspace = 0816, and Escape = 1B16)
• There are no letters from other alphabets, because there was no room (7 bits only allows for 128 different characters)

Unicode is a different character set which uses multiple bytes for each character.

Unicode can therefore encode many more characters than ASCII can: it covers every major language in the world. It can even encode emojis 🤯!

The first 128 characters in Unicode are the same as ASCII. This means that if a character is in ASCII, it will have the same encoding in Unicode.



What is a disadvantage of using Unicode instead of ASCII?

More storage will be required, as Unicode uses more bits per character than ASCII does.