Cultural Caviar

The Rise and Fall of Statistics

July 30, 2014

Multiple Pages
The Rise and Fall of Statistics

We live in a century of nonstop adulation over how statistical analysis of big data is changing the world. Brad Pitt, for instance, starred in a successful Hollywood movie, Moneyball, about the fast-changing realm of baseball statistics.

Last week, however, Andrew Gelman, a professor of statistics at Columbia, offered some heresy about his fashionable field on his blog Statistical Modeling, Causal Inference, and Social Science. In a post entitled “A World without Statistics,” Gelman reflected:

A reporter asked me for a quote regarding the importance of statistics. But, after thinking about it for a moment, I decided that statistics isn’t so important at all. A world without statistics wouldn’t be much different from the world we have now.

Indeed, the world probably wouldn’t look all that much different. A Rip Van Winkle baseball fan awoken from a long nap might wonder why the games take so long these days, but little else is visibly different.

“To this day, statistical reasoning still strikes many as being in dubious taste. Larry Summers was let go as president of Harvard after pointing out that males tend to have larger variances in IQ than females.”

Gelman’s dissent is much like the long-running debate over how much computers and the other appurtenances of the information age have actually changed the world. For example, the physical structure of the street I live on hasn’t changed much since the subdivision was constructed around 1950.

The houses now have air conditioning and the cars are more aerodynamic, but overall the onrush of technology has become more subtle than in the two generations preceding the mid-century. In contrast, back in 1885, automobiles and electric power lines existed mostly in inventors’ imaginations.

A careful observer of my street would note that there are now more cables running from the telephone poles into the houses, and the children come outside to play less often. The Information Age has its pleasures, but they are less imposing than the preceding era’s thrills of rocketing about the landscape at ever increasing speeds.

My father’s first aeronautical engineering job after graduating from junior college in 1938 was designing a small part for a flying car. After all, the world had moved so swiftly from mules to motorcars that it seemed inevitable that cars would soon fly.

While he never was so crass as to articulate this, my career in the information age—supermarket scanners, personal computers, the Internet—always struck me as technologically anticlimactic compared to his career working on flying machines.

The more Dr. Gelman thought about his dismissive comment, the more he came up with examples of how the world is a nicer place because of advances in statistical theory and practice. And yet …

When I started writing this post, I was thinking that statistics doesn’t really matter, but I think that’s because I was focusing on some of the more highly-publicized but less beneficial applications of statistics: the use of statistical experimentation and inference to get p-values for tabloid-bait scientific papers, or for Google, Amazon, etc., to perfect their techniques for squeezing money out of their customers or, even at best, to test a medical treatment that increases survival rate for some rare disease by 2 percentage points.

(As a patient in a 1997 statistical study of the first successful monoclonal antibody for fighting cancer, rituximab, I would demur that statistics really matter if you happen to be one of the two percentage points.)

But statistics is central to how we think about the world. I still think that statistics is much less central to our lives than, say, chemistry. But it ain’t nothing.

Without progress in chemistry over the last couple of centuries, our technological goods would be made out of pig iron and leather and fueled by coal. We couldn’t have, say, airplanes.

In contrast, we almost certainly would have had airplanes without much progress in statistics, just as we had airplanes before we had computers. My father worked as a mid-level engineer at Lockheed from the 1930s into the 1980s, using mostly his slide rule until he broke down and bought an electric calculator around 1973.

A couple of years later, while he was helping me with my math homework, I remarked that his expertise in calculus must come from using it all the time on the job. No, he said, he just remembered it from school; surprisingly few fellows at Lockheed used calculus. If they really needed to know the area under a curve, they would cut out little rectangles of graph paper with scissors. The only time he’d used calculus on the job was when he was first working at the flying car company.

These anecdotes of slide rules and scissors point out that there are more ways than one to skin a cat. In particular, much of the modern technological world was hashed out before modern statistical concepts existed even on the blackboard.

How did people manage to get by before modern statistics? First, as old catcher Yogi Berra likes to say, “You can observe a lot just by watching.” Yogi didn’t need a lab coat and a clipboard to get a sense of what pitch Bob Feller tended to throw when the count was 2-2. (Fastball.)

For instance, soccer statistics have radically improved over the last few years. From these vast compilations of numbers we can now finally observe that the two best soccer players are … Lionel Messi and Cristiano Ronaldo, the two players who were already recognized as the best by hundreds of millions of people watching them on TV.

But the other way people got by is that they didn’t: things used to crash and kill you more often.

Lockheed’s F-104 fighter jet, which flew in March 1954, less than a year after Lockheed’s first IBM computer was installed, was designed and built with few of what we in the information age would consider the essentials. Yet, it still went 1,400 mph, which is a lot faster than you or I are ever likely to go in the 21st century now that the Concorde is gone.

It was a heroic era of technology, but not a safe one. The F-104 crashed so frequently that the German press called it “the Widowmaker” and “the Flying Coffin.” After a while, the top men at Lockheed lost interest in their “missile with a man in it” and handed it off to less creative workers like my father to sweat out ways to keep the German pilots from dying so much.

Eventually, they applied some of what they learned to the L-1011, a workhorse wide-body jetliner that had a much better safety record than its look-alike 3-engine rival, the DC-10.

One reason that statistics have seemed new and cool in this century—witness the Freakonomics fad of 2005 and the Nate Silver mania of 2012—is that the basic conceptual tools of modern statistics, such as correlation, were not developed until puzzlingly late, even though the math is not particularly difficult.

Thus, our culture is still learning how to think statistically. Consider all those irritating people who over the last few years have taken to commenting whenever there’s a study they don’t like: “Correlation is not causation!” Well, sure, that’s lame, but that’s actually an improvement over what they had previously assumed.

You might think that humanity’s long failure until very recently to develop sophisticated statistics was because nobody had much data to play around with. Yet there were massive censuses thousands of years ago. The Gospel of Luke notes, “In those days a decree went out from Emperor Augustus that all the world should be registered.” A millennium or more earlier in the Book of Numbers, God tells Moses to enumerate all the Hebrew men old enough and able to fight in the military. Moses comes up with 603,550.

That’s a pretty big number. And yet, for most of human history, nobody seemed terribly motivated to innovate techniques for analyzing human quantities much beyond counting them. This apathy stands out when contrasted to the prodigious mathematical breakthroughs made in support of astronomy and physics. In fact, much of the foundational work in statistics, such as the development of the normal distribution by Gauss and Laplace in 1809-1810, was to help astronomers deal with the random errors in their observations.

A generation later, the Belgian astronomer Adolphe Quetelet applied the normal distribution for the first time to human data—a coat maker’s table of the chest sizes of 5,378 Scottish soldiers—and showed they formed a rough bell curve.

Stephen M. Stigler, a professor in the U. of Chicago department of statistics, argues that the “statistical enlightenment” didn’t begin until 1885, when Francis Galton made sense of the fundamental stumbling block of “regression toward the mean.” That was 198 years after physicist Isaac Newton kicked off the Enlightenment by publishing his “Mathematical Principles of Natural Philosophy” that explained “The System of the World.”

The lag between Newton and Galton is puzzling because the latter, an amateur mathematician in his 60s, was no Newton. To this day, the stereotype that calculus takes more brains than statistics has endured. For instance, the Advanced Placement test in statistics for high school students, which wasn’t introduced until 1996 but has become highly popular, is widely considered easier than the AP tests on Newton’s invention, calculus. And yet tremendously significant methods were still waiting to be discovered by Galton in the 1880s.

Galton had been struggling since 1859 with the quantitative puzzles about genetic diversity in plants, animals, and people implied by his cousin Charles Darwin’s book The Origin of Species. The mathematical sophistication of Galton’s 1869 book Hereditary Genius is nugatory compared to the supreme elegance of the famous equations in physicist James Clerk Maxwell’s paper from the same year, “A Dynamical Theory of the Electromagnetic Field.” Galton’s genius lay instead in asking a long series of questions—such as how to apportion the effects of nature and nurture—that now seem obvious but apparently seldom troubled anyone before him.

Galton kept at his self-appointed task and, aided by professional mathematicians, continued to make breakthroughs into the 20th century. His seminal “The Wisdom of Crowds paper was published in 1907 when he was 85.

But if an octogenarian is making theoretical breakthroughs in your field, that suggests he doesn’t have enough competition.

As best as I can understand this pattern over the millennia, celestial questions involving geometry were traditionally seen as higher, purer, cleaner than earthier matters, and thus they attracted the finest minds. Mucking around with human data must have seemed sublunary, déclassé.

To this day, statistical reasoning still strikes many as being in dubious taste. Larry Summers was let go as president of Harvard after pointing out that males tend to have larger variances in IQ than females. Similarly, Jason Richwine lost his job last year after it was revealed that his Harvard doctoral dissertation, IQ and Immigration Policy, was a sophisticated statistical analysis of matters of grave national import.

Here’s a way to test Dr. Gelman’s theory that the rise of statistics has changed the world less than the hype suggests: certain uses of statistics are banned by law. Does the world look much different where thinking statistically is punished?

For example, the federal government forbids lenders from using race or ethnicity in deciding whether or not to approve a mortgage. To enforce this prohibition, it collects a vast quantity of data under the Home Mortgage Disclosure Act to empower community NGOs like the late ACORN to more easily sue lenders for disparate impact discrimination.

In turn, the government nurtured a business culture in which anything you observed just by watching about racial differences in creditworthiness was something you couldn’t discuss publicly, or in emails that could be subpoenaed. Thus when ambitious corporate executives like Angelo Mozilo of Countrywide Financial and Kerry Killinger of Washington Mutual boasted that they would expand rapidly by loaning more to Hispanic victims of prejudicial redlining, no one, as far as I’ve been able to find in six years of looking, publicly criticized their assumption of irrational bigotry against minorities.

So, here’s a case where our culture laboriously emasculated itself of one obvious use of statistics. And what has been the result? Well, the world looked much the same. Countrywide and Washington Mutual flew even higher and faster. The only problem was that mortgage lending in the heavily Hispanic “Sand States” kicked off the Global Financial Crisis of 2008.

Without good statistics, the world would be quite similar, except for when it’s crashing and burning.

Daily updates with TM’s latest