11.06 - Floating Point Representation

GCSE Link: 3.02 (Binary and Hexadecimal)

You should already be familiar with standard form, m × 10^x, where m is a multiplier between 1 and 10, and x is an integer exponent. Standard form is a much more concise way of writing very large and very small numbers.

This conciseness is the basis of Floating Point Representation. However, because computers use binary, we use m × 2^x instead. Also, in AQA's version of Floating Point Representation, m is between 0 and 1, not 1 and 2. (Note that the IEEE 754 standard used in real life is different to what you need to learn.)

Floating Point Representation is a different way of representing fractions in binary.

In floating point, we have a fractional mantissa (multiplier) and an integer exponent (both signed). You will be given the lengths of both of these components if asked to convert a number to its floating-point representation.

To convert a number to its floating-point representation, use the following steps. We'll take the example of 23.5, with an 8-bit mantissa and a 4-bit exponent.

Step 1: Convert the number into fixed-point binary, using the same number of bits as the mantissa.

Note: because the mantissa is 8 bits long, we added an extra 0 at the end.
Step 2: Count the number of digits before the decimal point, excluding the sign bit.
For our example of 23.5, the answer is 5. This is our exponent.
Step 3: Convert the exponent into signed binary.
This will be 0101 for our example.
Step 4: remove the decimal point from the answer of Step 1.
For us, this is 01011110. This is the mantissa. We are now done.

Normalisation is the process of maximising the precision of values that can be represented in floating-point.

We want to use the whole mantissa to make our value more precise, so we need to make sure the first non-sign bit is significant. This means that all mantissas should start with either 01 (for positive) or 10 (for negative).

We can store a much larger range of numbers with floating point than fixed point, because we only care about significant figures, not decimal places. This is good because we don't really care about decimal places for extremely large numbers.

Can 0.3 be stored exactly in binary?

No - any fraction which doesn't have a power of 2 in the denominator is recurring in binary. Since 0.3 is 3/10, and 10 is not a power of two, it can never be stored exactly (it is 0.0100110011...). In the single-precision floating-point format, the closest value to 0.3 is 0.3000000119...