Berkeley, fall 1973. Of the men who applied to graduate school, 44% were admitted. Of the women, 35%. A nine-point gap, in the obvious direction. Then you look department by department — and in most of them, women were admitted at a higher rate than men.
Both of those statements are true, from the same data, at the same time. The gap is real in the total and the reversal is real in the parts, and no number is wrong anywhere. This is Simpson's paradox, and the Berkeley admissions figures are its most famous real-world instance — the one that ended up in front of a lawsuit's worth of attention and then in every statistics textbook. This page doesn't ask you to take the reversal on faith. It recomputes it in front of you from the primary admissions table, shows exactly why adding the departments together points the wrong way, and then — because this is the part most retellings get wrong — is careful about what the correction does and does not prove.
Measuring bias is harder than is usually assumed, and the evidence is sometimes contrary to expectation. — Bickel, Hammel & O'Connell (1975), the paper's subtitle
The university-wide figures above (8,442 men, 44%; 4,321 women, 35%) are the ones usually quoted. To recompute anything we need the table that was actually published, so from here on we use the canonical version every statistician learns from: the six largest graduate departments, anonymised A–F exactly as in the source, with their real applicant and admission counts. Pooled across those six, the apparent gap is even starker — men 44.5% against women 30.4%.
Press split by department. The single accusing pair of bars breaks into six, and watch what happens to the direction.
In four of the six departments — A, B, D, F — women were admitted at a higher rate than men. In the other two the men lead, narrowly. Yet stack the same people into one column and women come out fourteen points behind. The total is not lying about its own arithmetic; it is answering a different question than the one it appears to answer.
The total admission rate for women is a weighted average of the per-department rates — weighted by where women applied. If women apply mostly to departments that admit few of anyone, their overall rate is dragged down no matter how fairly each department treats them. So the real question is: did the two groups apply to the same places? They did not, and it is not subtle. Here is each department's admit rate against the share of its applicants who were women.
The relationship is unmistakable and the page measures it for you: the correlation between a department's admit rate and the fraction of its applicants who were women is —. The two easiest departments to enter (A and B, admitting well over half of all applicants) drew almost no women — together women were just — of their applicants. The hardest departments drew them in force. Women weren't being turned away at the door more often; they were lining up at the harder doors.
If the gap comes from which departments each group applied to, then the way to ask "were they treated differently once they applied" is to put both groups in front of the same mix of departments and recompute. Drag the slider from how they actually applied toward a single shared application mix, and read the two overall rates as the difference in doors melts away.
Hold both groups to the same department mix and the fourteen-point deficit doesn't just shrink — it changes sign: women come out a few points ahead. The standard epidemiological version of the same correction, the Mantel–Haenszel common odds ratio pooled across the six departments, agrees: —, women's within-department odds of admission slightly above men's. This is exactly the shape of what Bickel, Hammel and O'Connell found across the whole university in 1975 — that once you account for the autonomy of departmental decisions, "there is a small but statistically significant bias in favor of women."
…
Everything above is computed from these counts and nothing else. They are transcribed verbatim from the table the case is built on — anonymised A–F in the original — with the totals and the standardized row added by the recomputation above.
| dept | men appl. | men adm. | men rate | women appl. | women adm. | women rate | who leads |
|---|
Here is where most tellings of this story stop one step too soon, and turn a lesson about arithmetic into a verdict about people. So, precisely:
Conditioning on department explains the aggregate gap — it shows the gap is a consequence of application patterns, not of departments admitting women at lower rates. It does not prove that no sex bias was at work. It relocates the question. Bickel, Hammel and O'Connell said this themselves: women applied disproportionately to the departments that were hardest to enter and, often, least funded — and why they did is not something the admissions table can answer. If earlier schooling and socialization steered women away from the fields that happened to admit more freely, that is a real bias; it has simply moved upstream of the moment a folder lands on an admissions committee's desk. "Controlling for department" makes that upstream bias invisible, not absent.
So the honest reading is double, and both halves matter: the specific charge — that Berkeley's departments admitted women at lower rates — does not survive contact with the disaggregated data, and in fact reverses. And the broader question of why the pipeline looked the way it did is left open, exactly where the data leaves it. The paradox is a warning about a move, not a clean bill of health: whenever a total disagrees with all of its parts, someone is choosing which one to believe — and that choice is doing the arguing.
The same trap turns up everywhere two groups are pooled across uneven conditions:
Treatment A beats B for small stones and for large stones, yet B beats A overall — because the harder cases were steered to A. (Charig et al., 1986.)
A higher share of Republicans than Democrats voted for it — yet within both the North and the South, Democrats voted for it at higher rates. Region was the lurking variable.
One player can out-hit another in both halves of a season and still finish with the lower average, if their at-bats fell in different proportions. (Justice vs. Jeter, 1995–96.)
Simpson's cousin in geography: a correlation across counties or countries can vanish or flip inside each one. The group is not the average of its members.
UCBAdmissions, citing Bickel et al. 1975), recomputed, not constructed. The university-wide 44% / 35% headline is a different, larger population than the six-department subset (44.5% / 30.4%), which is why the two differ; both are labelled as what they are. The standardized rates and the Mantel–Haenszel ratio are this page's own recomputation on the six departments, and they echo — they do not reprint — the full study's finding across all departments. The verdict is about the aggregate-vs-departmental comparison, with the upstream question left open in §V.