Numerical Analysis

If you work on building computer systems, chances are you'll need to understand how floating-point arithmetic works. Surprisingly, there aren't many clear and detailed resources out there on the topic. One of the few books that covered it in depth - Floating-Point Computation by Pat H. Sterbenz - is difficult to acquire and tedious to read.

In this part of the site, I'm building a practical introduction to the numerical analysis behind floating-point arithmetic for system builders.

This is a work in progress, and will probably remain so for quite some time - I am still stuck on just making mathematical symbols appear properly here.

Rounding errors

Different rounding strategies affect the results of basic math operations like addition, subtraction, multiplication, and division. There are two key ways to measure rounding error: ulps (units in the last place) and relative error.

The IEEE Floating-Point Standard has been widely adopted by hardware manufacturers for implementing floating point arithmetic. It includes rules for rounding and builds on concepts introduced in the first section.

The final section explores how floating-point arithmetic influences system design, including topics like instruction sets, compiler optimization, and how exceptions are handled.

I've tried to avoid making any claims without also explaining why they're true. Most of the reasoning only requires basic calculus, and I've kept the more proof-heavy stuff in The Theorems.

Exactness

In floating-point, exactness is rare due to rounding. But knowing when you can subtract without error is useful for designing robust numerical algorithms—like square root calculations, computing differences, or detecting small perturbations.