## Floating-point multiplication28. Nov '13

# Introduction

In computers real numbers are represented in floating point format.
Usually this means that the number is split into *exponent* and *fraction*,
which is also known as *significand* or *mantissa*:

The mantissa is within the range of 0 .. base. Usually 2 is used as base, this means that mantissa has to be within 0 .. 2. In case of normalized numbers the mantissa is within range 1 .. 2 to take full advantage of the precision this format offers.

For instance Pi can be rewritten as follows:

# Single-precision floating point numbers

Most modern computers use IEEE 754 standard to represent floating-point
numbers. One of the most commonly used format is the *binary32*
format of IEEE 754:

```
sign fraction/significand/mantissa (23 bits)
| / \
| exponent (8 bits) / \
| / \ / \
0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1
```

Note that exponent is encoded using an offset-binary representation, which means it's always off by 127. So if usually 10000000 in binary would be 128 in decimal, in single-precision the value of exponent is:

Same goes for fraction bits, if usually 10010010000111111011011 in binary would evaluate to 4788187 in decimal then in case of single-precision numbers their weights are shifted and off by one:

# Multiplication of single-precision numbers

Multiplication of such numbers can be tricky. In this example let's use numbers:

Normalized values and biased exponents:

The exponents:

The numbers in IEEE754 *binary32*:

The mantissa could be rewritten as following totaling 24 bits per operand:

Their multiplication totals in 48 bits:

Which has to be truncated to 24 bits:

The exponents 2 and -2 can easily be summed up so only last thing to do is to normalize fraction which means that the resulting number is:

Which could be written in IEEE 754 *binary32* format as:

# Multiplication of double-precision numbers

The IEEE 754 standard also specifies 64-bit representation of floating-point
numbers called *binary64* also known as double-precision floating-point number.

```
sign fraction aka significand aka mantissa (52 bits)
| / \
| exponent / \
| (11 bits) / \
| / \ / \
0 10000000000 1001001000011111101101010100010001000010110100011000
```

Compared to *binary32* representation 3 bits are added for exponent and 29 for mantissa:

```
0 10000000000 1001001000011111101101010100010001000010110100011000
0 10000000 10010010000111111011011
```

Thus pi can be rewritten with higher precision:

The multiplication with earlier presented numbers:

Yields in following *binary64* representation:

Thu fraction operands are 53 bits each:

And their multiplication is 106 bits long:

Which of course means that it has to be truncated to 53 bits:

The exponent is handled as in single-precision arithmetic, thus the resulting number in *binary64* format is:

Which converted to decimal is:

# Conclusion

Expected result:

Single-precision result:

Double-precision result:

As can be seen single-precision arithmetic distorts the result around 6th fraction digit whereas double-precision arithmetic result diverges around 15th fraction digit.