When you press the keys on your keyboard, the keyboard sends binary information to the computer. How does the computer know what the binary information means?
A character set is a collection of characters that a computer can recognise from their binary representation.
Computers can therefore use this character set to convert between characters and binary numbers.
Modern computers use two main character sets: ASCII and Unicode.
The American Standard Code for Information Interchange (ASCII) is a character set which uses 7 bits to encode each character.
This means each character is assigned a binary number from 00000002 (010)
to 11111112 (12710),
and therefore ASCII can encode a total of 128 different characters.
Table 1 shows some ASCII characters and their encodings in binary, hexadecimal, and decimal.
Table 1
| CHAR | BIN | HEX | DEC |
|---|---|---|---|
| Space | 010 00002 |
2016 |
3210 |
| 0 | 011 00002 |
3016 |
4810 |
| 1 | 011 00012 |
3116 |
4910 |
| 9 | 011 10012 |
3916 |
5710 |
| A | 100 00012 |
4116 |
6510 |
| B | 100 00102 |
4216 |
6610 |
| Z | 101 10102 |
5A16 |
9010 |
| a | 110 00012 |
6116 |
9710 |
| b | 110 00102 |
6216 |
9810 |
| z | 111 10102 |
7A16 |
12210 |
Some things to note about ASCII:
•
The numbers (0 = 3016
to 9 = 3916),
uppercase letters (A = 4116
to Z = 5A16),
and lowercase letters (a = 6116
to z = 7A16)
are all in order
•
The characters before Space = 2016
are all control characters (like Null = 0016,
Backspace = 0816,
and Escape = 1B16)
•
There are no letters from other alphabets, because there was no room (7 bits only allows for 128 different characters)
Unicode is a different character set which uses multiple bytes for each character.
Unicode can therefore encode many more characters than ASCII can: it covers every major language in the world. It can even encode emojis 🤯!
The first 128 characters in Unicode are the same as ASCII. This means that if a character is in ASCII, it will have the same encoding in Unicode.
What is a disadvantage of using Unicode instead of ASCII?
Tap/click to reveal More storage will be required, as Unicode uses more bits per character than ASCII does.