Artificial Wasteland · a showing you can operate
The Number That Undid the Root
To make a 3D world you divide by the length of a vector thousands of times a frame — and each length is a square root, once the slowest arithmetic a chip could do. In 1999 the code for Quake III Arena shipped with a way around it: read a decimal number's raw bits as if they were an integer, subtract that from one strange hexadecimal constant, read the bits back as a decimal, and — with no division and no square root — you are within a fraction of a percent of 1/√x. Here it is, running on any number you give it, with every claim recomputed in front of you.
The famous fragment, verbatim from q_math.c — right down to the two comments
that made it legend:
float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( long * ) &y; // evil floating point bit level hacking i = 0x5f3759df - ( i >> 1 ); // what the fuck? y = * ( float * ) &i; y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed return y; }
Two lines carry the whole trick. i = *(long*)&y takes the 32 bits that
spell the floating-point number and, without changing a single one of them, calls
them a plain integer. i = 0x5f3759df - (i>>1) then does ordinary integer
arithmetic on that — a halving and a subtraction — and when you spell the result back as a
float, it is already a good guess at 1/√x. The last line cleans it
up. Nothing here should work. Watch it work.
The bits are already a logarithm
The reason this isn't sorcery is that an IEEE-754 float is built out of a logarithm. A positive number is stored as x = (1 + m)·2E−127: an 8-bit exponent E and a 23-bit fraction m ∈ [0,1). Take the 32 bits as an integer and you get exactly I = 223·(E + m). Now look at the true logarithm: log₂x = (E − 127) + log₂(1 + m). Over m ∈ [0,1) the curve log₂(1 + m) is nearly the straight line m — so, up to a small constant nudge σ, the integer is the logarithm, rescaled:
I / 223 − 127 ≈ log₂x
Amber: the true log₂x. Cyan dashes: I/2²³ − 127, computed from the raw bits. They part company by at most 0.086 — the whole reason the hack is approximate and not exact.
Once the bits are a logarithm, the impossible line becomes obvious. You want
y = x−1/2, i.e. log₂y = −½·log₂x.
In the integer picture that is just
Iy = ³⁄₂·2²³·(127−σ) − ½·Ix. The
−½·Ix is i >> 1 subtracted; the
constant out front is 0x5f3759df. Multiplying two numbers became
subtracting their logs — Napier's four-hundred-year-old idea, hiding in the bit
layout of a float. Solve 0x5f3759df = ³⁄₂·2²³·(127−σ) and you
recover σ = 0.0450: the single fudge that best
straightens log₂(1+m).
The seed is good to about 3.4%. The last line of code fixes that. It is one step of Newton's method on f(y) = 1/y² − x — the function that is zero exactly when y = 1/√x. Work the iteration y ← y − f(y)/f′(y) through and it collapses, with no approximation, to precisely y·(1.5 − 0.5·x·y²). One pass takes the error from 3.4% to 0.175%; the second, commented-out pass would take it to 0.0005% — which is why they could delete it and never be missed.
Why that number?
The constant isn't magic and it isn't unique. It is the value of ³⁄₂·2²³·(127−σ) for the best σ — but "best" depends on what you minimise, and a search can beat the round-number choices. Drag it and watch: the whole error curve slides, and 0x5f3759df sits down near the bottom of a basin. Push it a few thousand either way and the error balloons.
Signed relative error across x ∈ [0.25, 4] (the pattern then repeats every factor of 4). The vertical scale re-fits to whatever the current worst case is — read it off the label above.
Chris Lomont worked the algebra out in a 2003 paper and, unsatisfied, searched every nearby integer for the one with the smallest worst-case error after a Newton step. The winner, 0x5f375a86, beats the shipped constant — but by a hair: on the metric above it improves the worst case from 0.17522% to 0.17513%. His cleaner closed-form value, 0x5f37642f, is derived to minimise the error of the raw seed — and there it genuinely wins (switch the instrument to raw seed and watch its curve drop below Quake's). But add the Newton step and it falls behind: the optimum moves, because the iteration reshapes the error. The tidy optimum and the min-max optimum are not the same point. Whoever first chose 0x5f3759df — without the algebra — had already landed within a whisker of the best number there is. (You can go further still: in 2009 Jan Kadlec retuned the constant and the Newton coefficients together, reaching ≈0.065% — but that is no longer a plain Newton step, it is a fitted correction.)
The record — who wrote it, and the thing everyone gets wrong
The name attached to this function is almost always John Carmack, id Software's lead programmer, because it surfaced when Quake III's source was released under the GPL at QuakeCon in 2005. But Carmack did not write it: asked directly, he disclaimed it and pointed elsewhere (by one account suggesting Terje Mathisen, who in turn said it wasn't his either). The trail runs back much further than the game.
The most careful tracing is Rys Sommefeldt's, who chased it down for Beyond3D. His reconstruction: the routine reached id from earlier code; a related use runs through Gary Tarolli (SGI, later 3dfx); and the constant itself appears to trace to Greg Walsh at Ardent Computer in the late 1980s, who devised it drawing on ideas from Cleve Moler (later of MATLAB) and, further back, unpublished work by William Kahan and K. C. Ng at Berkeley. This is the best-supported account — but it rests largely on recollections given decades after the fact, and no dated original artifact showing the exact constant 0x5f3759df has ever surfaced. So the honest statement is that it is folklore refined by several hands, its precise author unproven — not one person's flash of genius, and certainly not Carmack's. Where the record is uncertain, this page leaves it uncertain.▸ sources
And a caveat the legend usually drops: this trick is obsolete.
Since the early 2000s, x86 chips have a dedicated instruction (rsqrtss) that
returns 1/√x faster and more accurately than the bit-hack. The
function is a beautiful period piece — worth understanding, not worth shipping.
The check — everything above, recomputed
Nothing here is asserted from memory. A verifier reproduces the exact float32 arithmetic and confirms, 13/13 green:
- the bit reinterpretation round-trips exactly (the aliasing is honest);
- 0x5f3759df = 1,597,463,007, encoding σ ≈ 0.04505 — Lomont's value;
- I/2²³ − 127 tracks log₂x to within 0.086 raw, 0.043 once centred;
- raw seed max error 3.4376% → 1 Newton 0.17522% → 2 Newton 0.00047%;
- the 1.5 − 0.5·x·y² line is one Newton–Raphson step on 1/y² − x (checked symbolically at sample points);
- Lomont's search constant beats Quake's; his analytic one does not (on min-max) — both reported as measured, not as the story would prefer;
- the relative error is periodic over a factor of 4 in x.
The verifier and this page share the same reinterpretation code: research/fast-inverse-sqrt/verify.mjs
Domain notes: the hack is defined for positive, normal floats — the sign bit must be 0, and the derivation assumes the exponent stays in range. It computes an approximation; the guarantees above are worst-case bounds over x ∈ [0.25, 4], which by the periodicity property cover all positive normals. The single free choice is σ (equivalently, the constant); everything else is forced.