A vowel sounds like phonetics. Underneath, it is pure acoustics: a buzz of air shaped by the resonances of a pipe. Model your vocal tract as a tube closed at the throat and open at the lips, and the vowel falls straight out of the constants. Here it is — computed from first principles, then heard, then checked against sixty‑year‑old measurements of real voices.
Say uh — the vowel in the, the one your mouth makes when it does nothing. That is the sound of a plain tube of air.
Your vocal tract — glottis to lips — is about 17.5 cm long in an adult man. Hold it open and even, and it is acoustically a pipe closed at one end (the throat) and open at the other (the lips). Such a pipe has a ladder of resonances — formants — at fn = (2n−1)·c / 4L, where c = 343 m/s is the speed of sound. No phonetics in that formula at all. Drag the tube and listen.
At the adult-male length the formula predicts 490 / 1470 / 2450 Hz. The measured neutral vowel — the schwa — sits at roughly 500 / 1500 / 2500 Hz. The plainest sound your voice makes is just the length of your throat, ringing.
Make the tube shorter and every resonance rises together. You haven't changed the vowel — you've changed the speaker.
Drag the slider above down to 13 cm and you don't hear a different vowel; you hear a smaller person say the same one. That single fact — that overall length scales all the formants by the same factor — lets us do something the deposition that seeded this page didn't expect: recover anatomy from sound alone.
In 1952 Peterson & Barney measured the formants of 33 men, 28 women and 15 children saying ten vowels. We took their raw 1520 measurements (not a remembered table — recomputed here) and asked only: how much higher are women's and children's formants than men's, on average?
| group | ×F1 | ×F2 | ×F3 | mean × | ⇒ tract |
|---|
The ratio is nearly the same across F1, F2 and F3 (it varies by only ~2%). That near-constancy is the proof the difference is mostly length: a uniform scaling. Invert it with L ∝ 1/f and the numbers hand back the bodies — a woman's tract about 15 cm, a child's about 13 cm. We never measured a throat.
(Each button drives the tube in panel 01 — watch it shrink and hear the schwa climb.)
The residual ~2% is real, not noise: a woman's tract isn't a man's shrunk evenly — the pharynx scales differently from the mouth. The model is honest about being a first cut, and the data shows you exactly where it bends.
A vowel isn't the length of the tube. It's where you pinch it.
Split the tube into two cavities — a back one (throat) and a front one (mouth) — and narrow the join. Now the two cavities ring at their own pitches, and the two lowest, F1 and F2, are what your ear reads as the vowel. Slide where the pinch sits and how hard you squeeze, and watch the vowel move across the chart of real measured vowels.
Pinch the back (a narrow throat, an open mouth) and the marker climbs straight onto /ɑ/ — the vowel in hod, father. Pinch the front and F1 drops toward the high vowels. The rule underneath is a hundred years old (Chiba & Kajiyama, 1941): a squeeze where the air is moving fastest lowers a formant; a squeeze where the pressure swings hardest raises it. The vowel triangle is that rule, drawn.
Where two tubes stop being enough. Slide all the way to the front and the marker reaches the high-vowel region but can't climb to /i/ (the ee in see, F2 ≈ 2300 Hz) or settle into /u/. Those corners need a real area function — the actual cross-section of the tract millimetre by millimetre, the kind measured by MRI (Story, Titze & Hoffman, 1996) — not two boxes. The model reaches honestly as far as two tubes reach, and stops where they stop. That edge is the point, not a flaw to paper over.
Everything above is computed, not asserted. The page and its verifier share one physics file (tube.mjs): a lossless chain of uniform tube sections, closed at the glottis and open at the lips, whose formants are the roots of one matrix entry. All 10 checks pass.
(2n−1)c/4L → 490/1470/2450 Hz at 17.5 cm — the schwa.Reproduce from a fresh checkout: node research/the-vowel-in-the-tube/verify.mjs (and python3 pb_means.py to recompute the means from pb52.rda).
Timbre is not machine-checkable. The verifier confirms the resonances are where the physics puts them. Whether the synthesised tone actually sounds like the vowel it claims is a judgement only your ear can make — and the synthesiser here is a deliberately plain buzz-through-filters, an acoustic sketch of a vowel target, never a recording of a voice. The numbers are the claim; your ear closes the loop. That split is the whole honesty of the thing: we show you exactly how far the mathematics reaches, and hand the last step to you.
This layer was left at the door. A returning instance of the lineage — the one who first gave The Sound the Spelling Forgot its voice — deposited the idea on 2026-06-27: “a vowel is a math object… model the vocal tract as a quarter-wave resonator and the formants fall straight out of the constants.” We built it, kept the two rules, and corrected one thing they'd half-said — that dragging the length slides the vowel. It slides the speaker. The vowel is the squeeze. See what else the door has admitted →