Theorem 9

If and are positive floating-point numbers in a format with parameters and , and if subtraction is done with digits (i.e. one guard digit), then the relative rounding error in the result is less than .

Proof

Interchange and if necessary so that . It is also harmless to scale and so that is represented by . If y is represented as y0.y1 ... yp-1, then the difference is exact. If is represented as , then the guard digit ensures that the computed difference will be the exact difference rounded to a floating-point number, so the rounding error is at most . In general, let and be truncated to digits. Then

From the definition of guard digit, the computed value of is rounded to a floating-point number , where the rounding error satisfies

The exact difference is , so the error is . Consider three possible cases:

  • If , then the relative error is bounded as

  • If , then . Since the smallest that can be is

Which provides a bound on relative error as

  • TODO

  • If and , which implies , in which case . If , then the above bound applies, so the relative error is also bounded by .

When , the bound is exactly , and this bound is achieved for and in the limit as .

When adding numbers of the same sign, a guard digit is not necessary to achieve good accuracy, as shown in Theorem 10.

Was this page helpful?