Summation Kahan and Klein overview.
The float and double types follow the IEEE 754 standard, which states that a floating-point number is represented by a sign S, an exponent E, and a mantissa M which can be converted to the real value by the following rule:
V = (-1) S * 1.M * 2E - Ebias
Computer can’t represent every real number in memory: the range and the precision depends on the number of bits that we have. In table you can see the main characteristics of the 32-bit, 64-bit, and 80-bit floating-point numbers.
Sign | Exponent | Mantissa | Digits | Ebias | |
---|---|---|---|---|---|
32bit | 1 | 8 | 23 | ≈7.2 | 127 |
64bit | 1 | 11 | 52 | ≈15.9 | 1023 |
81bit | 1 | 15 | 64 | ≈19.2 | 16383 |
Most of the classic arithmetic rules don’t work with floating-point numbers. Here is one of the most famous IEEE 754 equations: 0.1d + 0.2 ≠ 0.3d
We have such situations because 0.1d, 0.2d, and 0.3d can’t be perfectly presented in IEEE 754 notation:
0.1d ~ 0.100000000000000005551115123125783
+
0.2d ~ 0.200000000000000011102230246251565
=
0.300000000000000044408920985006262
0.3d ~ 0.299999999999999988897769753748435
Many arithmetic rules don’t work with float and double in general:
- (a + b) + c ≠ a + (b + c)
- (a * b) * c ≠ a * (b * c)
- (a + b) * c ≠ a * c + b * c
- ax + y ≠ ax * ay
Lets consider the following sample:
using System;
using System.Text;
using MatrixDotNet;
using MatrixDotNet.Extensions;
namespace Samples.logs.AbsorptionSample
{
public class AbsorptionSampleDocs
{
public static void Run()
{
// initialize Matrix.
Matrix<double> matrix64Klein = new double[5, 5];
Matrix<double> matrix64Kahan = new double[5, 5];
Matrix<decimal> matrix128 = new decimal[5, 5];
// assign max value for demonstration absorption.
matrix64Kahan[0, 0] = Math.Pow(10,16);
matrix64Klein[0, 0] = Math.Pow(10,16);
matrix128[0, 0] = (decimal) Math.Pow(10,28);
// make error
for (int i = 0; i < matrix64Klein.Rows; i++)
{
for (int j = 1; j < matrix64Klein.Columns; j++)
{
matrix64Kahan[i, j] = 1;
matrix64Klein[i, j] = 0.1;
matrix128[i, j] = 0.1m;
}
}
// compare algorithms of summation first case
double defaultSum0 = matrix64Kahan.Sum();
double kahanSum0 = matrix64Kahan.GetKahanSum();
double kleinSum0 = matrix64Kahan.GetKleinSum();
// compare algorithms of summation second case
double defaultSum1 = matrix64Klein.Sum();
double kahanSum1 = matrix64Klein.GetKahanSum();
double kleinSum1 = matrix64Klein.GetKleinSum();
// compare algorithms of summation third case
decimal defaultSum2 = matrix128.Sum();
decimal kahanSum2 = matrix128.GetKahanSum();
decimal kleinSum2 = matrix128.GetKleinSum();
}
}
}
As you can see we have three matrix with 64 and 128 bit, so next step assign max values for this matrices for demonstration hit in absorption.
For invoke Klein's or Kahan's algorithm must invoke method GetKahanSum()
and GetKleinSum()
Klein's algorithm have the most accuracy summation of matrix than Kahan's, however if values not big you can use Kahan's algorithm.
Output
64 bit
Default sum: 10,000,000,000,000,012.00
Kahan sum: 10,000,000,000,000,020.00
Klein sum: 10,000,000,000,000,020.00
64 bit
Default sum: 10,000,000,000,000,000.00
Kahan sum: 10,000,000,000,000,000.00
Klein sum: 10,000,000,000,000,002.00
128 bit
Default sum: 10,000,000,000,000,000,000,000,000,000.00
Default sum: 10,000,000,000,000,000,000,000,000,000.00
Klein sum: 10,000,000,000,000,000,000,000,000,002.00
Thus, the default sum counts incorrectly, the runtime rounds it because the floating type precision is not enough to handle digit numbers. Klein calculates with less error, but at the same time it works slower than Kahan.
Note
If you didn't find answer for your question on this page, ask it on gitter.