Emily Oster

10 min Read Emily Oster

Emily Oster

Lots of Studies Are Bad

Emily Oster

10 min Read
One of my children’s favorite places in the world is the arcade at Ryan Family Amusements in Newport, R.I. I will confess to also loving it, and my go-to game is something we call “Ice Zombies,” where you shoot video zombies with a water gun. One of the main warnings the game gives you as you play is “Look out for zombies from both sides!

I was thinking about this phrase in considering two relatively new studies — one on the value of masking from the CDC, and one on the value of lockdowns out of Johns Hopkins University. These are on very different sides of the COVID divide. The first argues that masks lower the risk of COVID infection by between 56% and 83%. The second argues that lockdowns in general had almost no impact on mortality. The reality is that they are both flawed. There is such a strong tendency to see good in the studies we agree with and bad in the ones we do not. But there are bad studies — just like zombies — on both sides.

I’ll talk through both studies, and then reflect a bit on whether there is any way to see these flaws more clearly when studies are first released.

MMWR on masks

The latest CDC Morbidity and Mortality Weekly Report (MMWR), on masking, is here. The goal of this study was to compare COVID rates for people who masked and who did not, and the primary conclusion was that masking was extremely effective at preventing COVID. The agency made a little graphic, which was widely shared; I’ve included it below, albeit reluctantly given how misleading it is.

The method in this study is what’s called a “negative control.” The authors identified individuals in California who tested positive for COVID. They then identified a matched control (someone who was matched on age, sex, and region but who tested negative in the same time frame). They reached out for a survey to the positive and negative individuals and asked them about their characteristics. This included behaviors, masking, vaccination, demographics, etc. They then compared the behaviors for the two groups. The central result they present is the fact that people who tested negative were more likely to report wearing masks than people who tested positive.

There are an enormous number of issues with this study. They begin at recruitment and data collection. Only about 13.5% of the negative-test people contacted and 8.9% of the positive-test people contacted agreed to participate. These rates are very low overall, and we immediately worry that the selection in respondents is different in the two groups. The information about masking was collected based on self-reports from these individuals, which opens up the possibility that various biases could impact the results (for example, “social desirability bias,” where people tell interviewers what they think they want to hear).

Even putting aside the low response rates, there are huge issues with the selection of the two groups. Most notably, the groups are totally different in terms of their reasons for testing. In the group that tested positive, 78% of them tested due to symptoms. In the group that tested negative, only 16.7% tested due to symptoms. The most common reasons for testing in this group were that it was required for work or school (43.1%) or that it was required for a medical procedure (16.9%).

The groups also differed in their degree of vaccination (32.1% in the negative group, 17.6% in the positive group; these data started in February 2021, so vaccination is lower than current rates). The negative group was also higher-income.

Putting this all together, what this study is doing — at the core — is comparing the masking behavior among a select sample of wealthier individuals who are tested primarily for work, travel, or medical procedures with an even more select sample of lower-income individuals who are testing primarily due to symptoms. This comparison is virtually impossible to learn from. The authors claim they’ve done sensitivity analyses around some of these issues, but the populations are so fundamentally different as to render the entire exercise completely meaningless.

To be clear: I’m not claiming that this study shows that masks do not work. What I’m claiming is that this study shows nothing. At best, it perhaps shows that masking is more common in some demographic groups than others (interesting, yet already known). It tells us nothing at all about masking efficacy, because it’s just too poorly constructed.

Johns Hopkins on lockdowns

The other side of the zombie horde is this study, in the form of a working paper out of Johns Hopkins, on whether lockdowns prevented COVID mortality.

This paper is a review and meta-analysis of a large number of papers that have aimed to understand how various movement restrictions impact COVID-19 mortality. This includes both the impact of full “lockdown” measures and the impacts of more specific interventions like school or restaurant closures. Aggregating across papers, the authors argue that these measures had only a quite small impact on COVID-19 mortality.

The most basic issue with this paper is probably in the conception of the question. In general, a meta-analysis aims to combine many small studies of the same question together to estimate an average effect size. The key to that sentence is the phrase “of the same question.” Meta-analysis is well-suited to situations in which it is plausible to argue that many studies are doing the same thing.

An example would be something like a meta-analysis of the impact of having an epidural on the length of labor. In this case, the treatment is well-defined (having an epidural) and the outcome is also well-defined (length of labor).

Using meta-analysis for this COVID question meets a challenge at the outset: What is a lockdown? The paper is going to try to combine many studies evaluating “lockdowns” early on in COVID. But that’s not a well-defined concept. New Zealand completely eliminated travel into the country and met any outbreaks with intense restrictions on movement. This lockdown is different from what was implemented anywhere in the U.S. And even across jurisdictions in the U.S, how lockdown was defined varies.

Again, the aim of the meta-analysis is to combine papers that are looking at the same question. But the individual papers are not widely comparable, since “lockdowns” varied so much. This matters — at least I think it does — because this paper cannot hope to (for example) address the question of whether a lockdown like the one in New Zealand affected mortality.

The second issue is that the paper engages in some difficult-to-understand exclusion choices. In a review of this type, the name of the game is to try to identify all relevant papers and then edit them down to those that can be compared and answer the question. The authors initially identify about 18,000 papers, which are eventually whittled down to 34. Yes, you read that right.

The first part of this whittling makes sense — their broad search mostly finds papers that are not about this topic. But the last stage is a bit more complicated. They end up with 117 papers that broadly fit their criteria, and then get down to 34. The details are in the graphic below.

Some of these exclusions make sense — duplicates, or papers that did not actually evaluate mortality. But some of this is more confusing. The authors exclude student papers. They exclude papers with a “time series” approach. They exclude synthetic-control studies.

This last exclusion is an example of an odd one. Synthetic control is an approach frequently used in economics to generate a control group that is better matched to a treatment group (it’s a neat procedure that deserves its own newsletter at some point). It’s difficult to see why you would exclude papers which do this. The authors argue that they do so because one paper using it (which found that Sweden would have benefited from a lockdown) was criticized by another paper. But they don’t seem to use the same kind of criteria for all the papers they include, so I’m left a bit concerned that there is another motivation here.

The sheer volume of papers excluded makes it difficult to regard this as a true review.

Others commenters have raised more objections — pointing out that mortality may not be the only metric of interest, for example. For me, the first of my concerns is the most central. This paper is problematic because the question is ill-suited to a meta-analysis of this type.

That doesn’t mean lockdowns worked! Again, it’s not even clear what that would mean, since there is no one definition of lockdown. My point isn’t that this paper is wrong in its conclusions, just that it’s largely uninformative. The authors begin with an interesting graph showing a limited relationship between the stringency of COVID restrictions and mortality. That deserved more study, but this paper isn’t helping us understand it much.

Who should I trust? 

The reaction to these studies was enormously frustrating to me. Both studies are poor. And both were taken up seemingly uncritically by people whose priors they supported. People who oppose the relaxing of mask requirements pointed to the first as proving that masks work and that we should be reluctant to move away from them. People who oppose lockdowns picked up the second study as proving that lockdowns ruined lives with no benefits.

There may be merit to both of these positions. There may be evidence for both of them. But that evidence isn’t enhanced by these papers.

One question is, as a consumer, is there a way to actually identify which studies are problematic in these ways? Like, how would you know not to trust these? This can be hard to do when reading on your own, in part because some of the reasons to distrust them are kind of statistically complicated.

The simplest way I have found to try to impose discipline on your judgment is to consume some media or commentary outside of your echo chamber. This will make you mad! But hear me out. In the example of the mask study, reading the outcry from the political right identified a lot of the issues with the study, even if the tone was sometimes a bit much. A similar thing could be said for the outcry from the political left on the other.

For a while, I was trying to listen to a conservative podcast during half of my long runs and a liberal one on the other half, just for balance. But even if you do not want to do this regularly (indeed, I eventually had to save my sanity and switch to podcasts about sports), you can still take advantage of it in the moment. If you’re wondering whether to believe a study, start with what the skeptics say.

In these cases, though, there are zombies on all sides. Get out your water gun.

Community Guidelines
Covid-19 rapid antigen tests arranged in a pattern on a yellow background.

Feb. 20, 2023

12 min Read

COVID-19: Where to Go from Here

A long-term view of the virus

Covid-19 rapid antigen tests arranged in a pattern on a yellow background.

Oct. 20, 2022

9 min Read

Should You Get the Bivalent Booster?

The latest on the risks and benefits of COVID vaccines boosters for older adults, pregnant people, and kids

A line graph with pink, yellow, and blue dots representing life's ups and downs.

Aug. 16, 2022

3 min Read

Wins, Woes, and Doing It Again

We have our first story from a dad! And it’s a good one. 10/10 —Girl Dad with Confidence Growing by Read more

Covid-19 rapid antigen tests arranged in a pattern on a yellow background.

Aug. 15, 2022

8 min Read

Updated CDC Guidelines for School and Child Care

NO QUARANTINES!!!