Floating point numbers
JavaScript does not have special data types for numbers like int
or float
in languages
like Java and C++.
Everything is simply number
.
This can create issues when dealing with floating-point arithmetic, such as the classic example of asking JavaScript to
calculate the value of 0.1 + 0.2
.
This is actually a common result that you will get from languages like Python, along with any system that implements the IEEE 754 standard for floating point arithmetic.
Rounding errors
When computers work with any numbers, they can't store them exactly, since memory space is inevitably finite. At some point, real numbers have to be approximated into a limited number of bits (0's and 1's). This means that most floating-point arithmetic results are not exact, and rounding becomes necessary. The techniques for rounding are a big part of what makes floating-point arithmetic unique.
One key idea employed in modern floating point arithmetic is guard digits, which help reduce rounding errors - especially when subtracting numbers that are very close together. IBM thought guard digits were so important that in 1968, they updated their entire System/360 architecture to add a guard digit in order to double precision. They even went and upgraded machines already in the field!
The IEEE floating-point standard goes a step further. It defines specific algorithms for basic operations (like addition, subtraction, multiplication, division, and square root), and says that all implementations must produce results identical to those algorithms. That means a program will give exactly the same results - bit for bit - on any system that follows the standard. This consistency makes the behavior of software become predictable when moving between different machines.
Dissecting a number
There are two different IEEE standards for floating-point computation. IEEE 754 is a binary standard that requires
- first bit is sign bit
- exponent is next 11 bits
- significand is remaining 52 bits (also called the mantissa)
When computers represent real numbers, the most common method is called floating-point representation. It works a bit like scientific notation and involves a base (usually 2 or 10) and a precision (how many digits to keep). For example, with base 10 and precision 3, the number
In a binary number system, certain decimal numbers like
In floating point arithmetic and associated proofs, a floating point number is typically written as
Reasons for precision loss
There are two main reasons for losing precision.
- Rounding error - numbers like
0.1
can't be precisely expressed in binary, even though they're simple to precisely express in decimal. - Out of range - the number is too large or too small to fit in the available exponent range.
Normalization and Uniqueness
Floating-point numbers can sometimes have more than one representation - for example,
Note that when you see a floating-point number like
Relative error and Ulps
Rounding error is measured by units in the last place (ulps).
Suppose you are working with a floating-point number, and the real value is
Another way to measure error is relative error, which compares the difference between the floating-point number and the real number, relative to the real number. For example, if you approximate
Ulps vs Relative Error
When rounding to the closest floating-point number, the error in ulps is always less than or equal to 0.5 ulp. However, when you express the error in terms of relative error, the size of the error can change more significantly, especially in larger numbers.
Wobble is the factor that measures how much the relative error can vary due to the way floating-point numbers are stored. Essentially, relative error can be influenced by the base and precision of the representation.
When you care about the exact error introduced by rounding, ulps is a more natural measure. But when you're analyzing more complex formulas, relative error gives you a clearer picture of how rounding affects the final result, especially in terms of overall computation.
If you're only interested in the general size of rounding errors, ulps and relative error are often interchangeable, though ulps are usually easier to work with for smaller errors.
Guard Digits
When subtracting two floating-point numbers—especially numbers that are very close to each other—rounding errors can get much worse than usual. One way to reduce this kind of error is to use guard digits.
Suppose we're using a floating-point format that keeps only 3 digits (p = 3). If we try to compute something like
However, hardware is limited to a fixed number of digits, so when the smaller number is shifted to line up with the big one, its digits are truncated. In this case, it becomes
However, this fails when the numbers are close together. For example, 10.1 - 9.93
in float point is
Guard digits provide extra digits for intermediate calculations. Even if the final result is rounded to 3 digits, those extra digits in the middle help catch and reduce big subtraction errors like the one above. In 1968, IBM added a guard digit to all of their floating-point units, and even retrofitted all of their older machines - it was that important.
Floating point arithmetic
What is NaN
In IEEE 754, NaNs are often represented as floating-point numbers with the exponent emax + 1 and nonzero significands.
Limits of precision
The Number
class provides constants
This bound exists lower as well.