3.09 - Run-Length Encoding

Run-Length Encoding (RLE) is a form of lossless compression where "runs" (groups) of consecutive data which have the same value are grouped together in length/data pairs.

Basically, that means we group together identical items which are next to each other, and then we store each group as its length and the item.

So, if we were compressing the string "aabbbacc" using RLE, we would use the following steps:

Group together runs of identical characters:
"aa", "bbb", "a", "cc"
Get the length of each group:
"aa" (2), "bbb" (3), "a" (1), "cc" (2)
Put them into length/character pairs:
(2, a) (3, b) (1, a) (2, c)

If we used 7-bit ASCII to encode the string, we would have needed 7 × 8 = 56 bits.

Assuming we use 2 bits for the number and 7 bits for the character, using RLE we need (2 + 7) × 4 = 36 bits.

This means that RLE saved 20 bits in this example.

RLE can also be used on images.

Diagram 1 shows an example 10×7 black-and-white image.

Diagram 1

With 0 representing black and 1 representing white, the uncompressed data would look like this (70 bits):

0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 0

However, with RLE, the compressed file would look like this (145 bits assuming 4 bits per number):

(6, 0) (1, 1) (4, 0) (4, 1) (1, 0) (1, 1) (1, 0) (1, 1) (2, 0) (4, 1) (1, 0) (1, 1) (4, 0) (4, 1) (1, 0) (1, 1) (9, 0) (1, 1) (3, 0) (2, 1) (2, 0) (3, 1) (1, 0) (1, 1) (1, 0) (1, 1) (4, 0) (2, 1) (3, 0)

Yes: in this example the RLE version is longer than the original. This does happen sometimes if there aren't many long runs of identical items.

Use RLE to encode the following binary sequence: 1111 1100 0001 1010.

(6, 1) (5, 0) (2, 1) (1, 0) (1, 1) (1, 0)