Floating point numbers
JavaScript does not have special data types for numbers like int
or float
in languages
like Java and C++.
Everything is simply number
.
JavaScript represents all numeric values with floating point numbers. The precision limitations of floating point numbers can lead to surprising issues, such as the canonical example of asking JavaScript to calculate the value of 0.1 + 0.2
.
This is a common result that you will also get from other languages like Python. Generally, any system that implements the IEEE 754 standard for floating point arithmetic will produce this result.
Python 3.13.2 (...) on darwinType "help", "copyright", "credits" or "license" for more information.>>> 0.1 + 0.20.30000000000000004>>> 0.1 + 0.2 == 0.3False
While most computer engineers are familiar with this behavior, very few understand why this happens. To quote the opening sentence of David Goldberg's comprehensive guide:
Floating-point arithmetic is considered an esoteric subject by many people.
Goldberg's guide to floating point arithmetic is written for an engineering audience. This guide is designed to be comprehensible by anyone - young people, the mathematically uninitiated, and the intellectually curious.
Binary numbers
Computers represent all numbers with binary digits - 0's and 1's - where each digit is called a bit. You can experiment with the 8 bit binary number below, by clicking on the bits to change them from 0 to 1.
= 0
Each digit represents a power of 2, just like our human numbering system uses each digit to represent a power of 10. A group of 8 bits is called a byte, and a single byte can represent 28 (256) different values.
In the example above, we are using 8 bits to represent an unsigned integer. If we want to store negative numbers, we can dedicate 1 bit as a sign bit to store 0 for positive numbers and 1 for negative numbers.
sign= 0
Try clicking the sign bit to see how the resulting value changes. If we allowed the sign bit to simply control whether there is a negative sign or not, we would have two binary numbers that both represent zero - 10000000
(negative zero) and 00000000
(positive zero). To avoid ambiguity, along with a variety of other reasons, computers represent negative numbers in two's complement. If the sign bit is 1
, then invert all bits (i.e. 1
is considered a 0
, and 0
is 1
) and add 1 to the sum to get the negative number.
Floating point numbers
Floating point numbers represent numeric values in a binary form of scientific notation (
- The sign - a single bit which indicates whether the number is positive or negative, where
0
is positive and1
is negative. - The exponent (written as
) - determines the scale of the number (how big or small it is). - The significand (written as
) determines the precision of the number (the actual digits), and is often referred to as the mantissa.
JavaScript uses 64-bit IEEE 754 double precision format for all numbers. This means that each 64 bit number uses 1 bit for the sign, 11 bits for the exponent, and 52 bits for the significand.
signexponentsignificand
The equation for computing a decimal value from the binary representation above is
Floating point intuition
Let's try to first understand this intuitively. The exponent is really just an interval between two successive powers of 2, like
=
As a floating point number gets larger, it "floats" to the next interval, and as it gets smaller, it "floats" to the previous interval. Intervals closer to zero are "more dense", in the sense that the significand provides a more precise number along that interval.
The significand is being displayed as a number, but it actually represents the sequence of binary digits of a normalized value which takes the form
You will often see floating point numbers written in scientific notation (
Floating point proofs
In floating point arithmetic and associated proofs, a floating point number is typically written as
Rounding errors
Just as we cannot precisely represent
One key idea employed in modern floating point arithmetic is guard digits, which help reduce rounding errors - especially when subtracting numbers that are very close together. IBM thought guard digits were so important that in 1968, they updated their entire System/360 architecture to add a guard digit in order to double precision, and even upgraded machines already in the field.
IEEE standards
The IEEE floating-point standards define specific algorithms for basic operations like addition, subtraction, multiplication, division, and square root. All implementations must produce results identical to those algorithms, especially with respect to rounding. This is to safeguard the assumption that a program will give exactly the same results - bit for bit - on any system that follows the standard. The IEEE standard has been widely adopted by hardware manufacturers for implementing floating point arithmetic.
There are two different IEEE standards for floating-point computation. IEEE 754 is a binary standard that requires
Reasons for precision loss
There are two main reasons for losing precision.
- Rounding error - numbers like
0.1
can't be precisely expressed in binary, even though they're simple to precisely express in decimal. - Out of range - the number is too large or too small to fit in the available exponent range.
Normalization and Uniqueness
Floating-point numbers can sometimes have more than one representation - for example,
Note that when you see a floating-point number like
Relative error and Ulps
Rounding error is measured by units in the last place (ulps).
Suppose you are working with a floating-point number, and the real value is
Another way to measure error is relative error, which compares the difference between the floating-point number and the real number, relative to the real number. For example, if you approximate
Ulps vs Relative Error
When rounding to the closest floating-point number, the error in ulps is always less than or equal to 0.5 ulp. However, when you express the error in terms of relative error, the size of the error can change more significantly, especially in larger numbers.
Wobble is the factor that measures how much the relative error can vary due to the way floating-point numbers are stored. Essentially, relative error can be influenced by the base and precision of the representation.
When you care about the exact error introduced by rounding, ulps is a more natural measure. But when you're analyzing more complex formulas, relative error gives you a clearer picture of how rounding affects the final result, especially in terms of overall computation.
If you're only interested in the general size of rounding errors, ulps and relative error are often interchangeable, though ulps are usually easier to work with for smaller errors.
Guard Digits
When subtracting two floating-point numbers—especially numbers that are very close to each other—rounding errors can get much worse than usual. One way to reduce this kind of error is to use guard digits.
Suppose we're using a floating-point format that keeps only 3 digits (p = 3). If we try to compute something like
However, hardware is limited to a fixed number of digits, so when the smaller number is shifted to line up with the big one, its digits are truncated. In this case, it becomes
However, this fails when the numbers are close together. For example, 10.1 - 9.93
in float point is
Guard digits provide extra digits for intermediate calculations. Even if the final result is rounded to 3 digits, those extra digits in the middle help catch and reduce big subtraction errors like the one above. In 1968, IBM added a guard digit to all of their floating-point units, and even retrofitted all of their older machines - it was that important.
What is NaN
In IEEE 754, NaNs are often represented as floating-point numbers with the exponent emax + 1 and nonzero significands.
Limits of precision
The Number
class provides constants
This bound exists lower as well.