When you press the keys on your keyboard, the keyboard sends binary information to the computer. How does the computer know what the binary information means?
A character set is a collection of characters that a computer can recognise from their binary representation.
Computers can therefore use this character set to convert between characters and binary numbers.
Modern computers use two main character sets: ASCII and Unicode.
The American Standard Code for Information Interchange (ASCII) is a character set which uses 7 bits to encode each character.
This means each character is assigned a binary number from 00000002
(010
)
to 11111112
(12710
),
and therefore ASCII can encode a total of 128 different characters.
Table 1 shows some ASCII characters and their encodings in binary, hexadecimal, and decimal.
Table 1
CHAR | BIN | HEX | DEC |
---|---|---|---|
Space | 010  00002 |
2016 |
3210 |
0 | 011  00002 |
3016 |
4810 |
1 | 011  00012 |
3116 |
4910 |
9 | 011  10012 |
3916 |
5710 |
A | 100  00012 |
4116 |
6510 |
B | 100  00102 |
4216 |
6610 |
Z | 101  10102 |
5A16 |
9010 |
a | 110  00012 |
6116 |
9710 |
b | 110  00102 |
6216 |
9810 |
z | 111  10102 |
7A16 |
12210 |
Some things to note about ASCII:
•
The numbers (0 = 3016
to 9 = 3916
),
uppercase letters (A = 4116
to Z = 5A16
),
and lowercase letters (a = 6116
to z = 7A16
)
are all in order
•
The characters before Space = 2016
are all control characters (like Null = 0016
,
Backspace = 0816
,
and Escape = 1B16
)
•
There are no letters from other alphabets, because there was no room (7 bits only allows for 128 different characters)
Unicode is a different character set which uses multiple bytes for each character.
Unicode can therefore encode many more characters than ASCII can: it covers every major language in the world. It can even encode emojis 🤯!
The first 128 characters in Unicode are the same as ASCII. This means that if a character is in ASCII, it will have the same encoding in Unicode.
What is a disadvantage of using Unicode instead of ASCII?
More storage will be required, as Unicode uses more bits per character than ASCII does.