Moneyball for Medicine, Anyone?

May 06, 2015

Multiple Pages
Moneyball for Medicine, Anyone?

Like many nerdy white guys of my generation, I’ve found the life story of Bill James inspiring ever since I began reading his Baseball Abstracts thirty years ago this month. A night watchman at a pork and beans cannery, James began writing up statistical analyses of baseball questions (for example: Q. When does the average player peak in performance? A. Age 27, well before peak fame) with an unprecedented level of insight and judgment. He began self-publishing his books in 1977, which nurtured a far-flung but lively community of “sabermetricians” who shared his commitment to databased argument rather than appeals to authority.

James’ perceptions and intuitions weren’t always right, but explaining his arguments in clear prose to other baseball stats obsessives encouraged him to refine and elaborate his views. Today, James’ followers hold many of the top front office jobs in baseball. His methods have been carried on to other sports, especially basketball.

He changed the world.

On the other hand, it’s not clear if the world as a whole is better off from turning sports statistics into a Safe Space for straight white males to practice their pattern recognition skills without much risk of irritating everybody else over what they discover. 

There would appear to be many fields more important to human happiness than sports statistics that could use a Bill James. Yet, while the public discussion of sports data is vastly more sophisticated today than when I was young, many more important eras for quantitative analysis haven’t improved, or have even regressed.

For example, the New York Times this week splashed heavily a massive study of income mobility over the generations by MacArthur genius grant winner Raj Chetty. The Harvard economist’s goal is to discover which counties have done a better job of nurturing youths in the 1980s-90s to earn higher incomes today. If we can only discover what some places are doing right to battle inequality, then the rest of the country can adopt their best practices.

“Perhaps the problem is that we don’t really like doctors laying all their cards on the table the way Bill James would do with his baseball theories.”

But Chetty’s vast undertaking has turned into a fiasco of comic results, such as: “Manhattan is extremely bad for children in families in the top 1%. It is among the worst counties in the U.S.” In contrast, Chetty finds that various meth lab-ridden counties in West Virginia are better launching pads for the children of the One Percent.

Chetty released an earlier version of his findings in July 2013. It was even worse back then, as I pointed out at the time. If Chetty were part of the community of baseball analysts, he’d have been embarrassed. But as an adviser to Hillary Clinton writing about inequality, he simply is less exposed to rigorous criticism than if he were a grad student opining about infield shifts.

Of course, the biggest reason why we seem as a culture to be regressing statistically is because of the War on Stereotypes. Today we have so much data demonstrating that yesterday’s deplorable prejudices – blacks tend to be crime prone, Jews tend to be good with money, and so forth – are indeed utterly true that social status has become dependent upon mouthing denials of reality. One big problem with lying, unfortunately, is that most people come to believe their own words.

But that raises the question of whether there might be alternative Safe Spaces to baseball statistics that won’t get white guys fired for noticing, while still doing the world more good than optimizing batting lineups.

For example, what about medicine? Could doctors get better at prescribing pills by applying some Moneyball techniques? Is there a need for a Bill James of pharmaceuticals?

You may have noticed that doctors, being busy professionals, don’t always have time to ponder deeply over which precise pill you should take. For example, back in 1997 my doctor prescribed Mevacor for my cholesterol. Why? Because he had some freebies in his drawer that the Mevacor salesman had given him. When I got home, I went on the Internet (a very new thing in 1997), then came back to him and said, “Why not Lipitor? That seems to have been more effective in clinical trials.” (Lipitor went on to be the biggest pharmaceutical moneymaker of all time.) He appreciated having somebody research this for him.

Last week the young psychiatrist who writes under the name Scott Alexander posted on his SlateStarCodex blog a long study he’d done of online ratings of depression medicines by patients. Doctors are seldom encouraged to pay attention to unsolicited end user opinionizing, but Alexander discovered that the three most usable sites, Drugs.comWebMD, and, had large sample sizes and high positive correlations among their ratings of medicines. And the user evaluations passed a series of clever tests that he devised: for example, antidepressants that are chemically identical but are sold under different generic names (for cunning intellectual property reasons) were rated almost identically.

But then he compared the average of patient ratings of antidepressants to doctors’ ratings, and found that they were slightly negatively correlated. Indeed, “the more patients like a drug, the less likely it is to be prescribed.”

Even more strikingly:

I correlated the average rating of each drug with the year it came on the market. The correlation was -0.71 (p < .001). That is, the newer a drug was, the less patients liked it.

This is not to say that there are no respectable possible reasons for these negative correlations. A lively debate ensued among Alexander’s highly perceptive commenters. Douglas Knight summed up the discussion:

Two explanations really attack the negative correlation. One is that patients and doctors make different trade-offs between short-term and long-term risks. The other theory is selection bias and anchoring on the patients. A patient that tries a rare drug (rare b/c disliked by doctors) after many have failed may be more impressed than a patient who got the same results from a first drug.

Still, whatever the reasons for these patterns, they are interesting. I had never heard of them before. In fact, none of Alexander’s readers had ever read of this kind of research.

Why aren’t there more informal but impressive Bill James-style analyses like this in the field of healthcare? Why not encourage highly intelligent doctors such as Alexander to discuss all available data online with highly intelligent patients?

The traditional view is that if patients and doctors were legitimized to discuss medicine together in forums, Jenny McCarthy-style crackpotism would run amok.

That’s quite possible.

Still, I have a certain amount of faith in intellectual elitism actually working, as long as the elitists go into the marketplace of ideas and take their lumps. It was pretty obvious in 1985, say, that Bill James was better at baseball statistics than anybody else before him. You could feel yourself getting better at thinking just by reading him.

One part of the problem in medicine is the assumption that medicine ought to be like physics, and therefore, like the Law of Gravity, what works for some should work for everybody. That’s how Science with a capital S works, right?

But in reality, Zoloft works for some people and Prozac works for some people, but not exactly all the same people. And the order in which your doctor prescribes Zoloft or Prozac to you can be a very big deal to you at the time when you are deeply depressed. Increasing the accuracy of doctors’ rank ordering of medicines to try out on individual patients would lead to a big absolute increase in human happiness, but drug companies don’t have much incentive to fund expensive research — such as how to figure out whether this individual patient should start with Zoloft or Prozac — that probably wouldn’t increase overall sales.

But if the research could be made cheaper by utilizing the propensity of people to sound off online for free, then perhaps the accuracy of prescriptions could be increased.

My impression is that the medical research profession is not very open to Yelp-style big data collection. They have pretty good reasons for being skeptical about the data. But, like the man said, quantity can have a quality all its own.

The reigning view is that all medical studies must have professional quality control. We are familiar with FDA studies to make sure that new drugs don’t cause Thalidomide-like horrors. But that’s an overly rigid viewpoint when it comes to helping doctors choose among medicines already approved by the FDA as safe.

We know that doctors are currently influenced by not-exactly scientific techniques such as hiring NFL cheerleaders as saleswomen. So why not consider online reviews? Obviously, they shouldn’t be trusted naively, but the occasional MD with a Bill James brain can provide critical consideration.

Perhaps the problem is that we don’t really like doctors laying all their cards on the table the way Bill James would do with his baseball theories. We prefer to imagine our doctors know more than we could ever imagine. If they told us their reasoning, it would take away their juju powers to heal us.

Finally, there’s the fear that big money has corrupted doctors into prescribing the wrong medicines, and we don’t really want to find that out.

Heck, even with Bill James, drugs seemed to corrupt him: the word “steroids” barely crossed his lips until about a decade-and-a-half into the era of steroids distorting statistics, when in 2009 he published a risible piece in Slate claiming that Barry Bonds’ late-career surge might have had more to do with the type of wood in his bat than with PEDs.

But that’s a sin for which I’ll forgive him, especially if he becomes a role model for analysts outside of sports.

Daily updates with TM’s latest