At Brown, my primary teaching is an undergraduate class, Econ 1430. I love it, even this semester when it’s only partially in person and everyone wears two masks. The title of the class is “The Economics of Social Policy” and we talk all about education, health and social programs. But, really, it’s a class on statistical methods. At the end of the day, I want my students to understand how to appropriately ask and answer policy questions, with a focus on causality.
I was reflecting on this as I anticipate research over the next months and years about COVID-19. Already we are seeing academic papers on COVID-19 mitigation policy (here is one from last week on masks and restaurants). Down the line we’ll see more on the outcomes of COVID lockdowns — did remote schools matter for long-term learning? Did lockdowns have a long term impact on the restaurant or hotel industry? And so on.
Not all of these studies will be created equal, and drawing good conclusions from them — conclusions which will help us make better policies in the future — requires understanding which are good and which are not. Today, I’m going to take a little bit of a dive into what I teach in my class. I’ll start by talking through a question which has nothing to do with COVID and then, of course, we’ll circle back.
An example study question
Universal PreK is a policy that has gotten a lot of recent attention. Some states have it, some do not. It’s been argued that it’s key to reducing gaps in school readiness across socioeconomic groups, but it’s also expensive. Is this the best use of resources?
Let’s imagine we want to analyze a slightly narrower policy question: Does PreK raise test scores in kindergarten? How might we go about doing that?1
The best and worst analyses
The best way to answer this question would be with a randomized trial. Imagine you took a group of kids and you randomly picked half of them to have access to PreK and not the other half. You could follow them over time, and see their test scores once they entered school. If you did the randomization right and you had a big enough sample, you could learn the causal effect of PreK on test scores (in this group, anyway). This seems great, but randomized trials are expensive, slow and not always feasible.
The worst thing to do (from a causal standpoint) would be to just compare test scores for kids who did and did not attend PreK programs. Enrollment in PreK is, in general, associated with better resourced families and more educated parents. It would be virtually impossible to separate out the impact of PreK from these other factors. The answer you would get would be wrong.
It isn’t a great choice — between infeasible and wrong — and for this reason, in studying questions like this it is very common to exploit location-level policies. Rather than comparing kids who went to PreK or not, compare locations with better PreK access to those with worse. In the COVID space, you can think of the parallel of comparing places with mask mandates to those without. But within these studies, there’s huge variation in the quality of analysis.
Exploiting location-specific policies
To fix ideas: imagine that you have 50 states (okay, you don’t have to imagine that part). And then imagine that in 15 of them there is universal PreK access; the other 35 do not have it. How can you exploit this to study the problem?
The simplest way you might use these data is to compare kindergarten test scores in the states with universal PreK to those without. This is easy to do! You literally just compare the groups.
However: you have a problem very similar to the problem with comparing across kids. The states with the universal PreK may be different in other ways. They may be more likely to have other child care subsidies. They may be richer. They may have better resources available in school. Once again, attributing the differences in test scores to PreK may be challenging (read: impossible).
Much better is to exploit the introduction of universal PreK. Imagine that in one of your 15 states, the universal PreK policy was introduced in 2005 and you observe kindergarten test scores between 2000 and 2015. You could make a graph like the below. If the data looks like what I’ve shown in there, that seems like evidence that PreK mattered.
Time Trends with Controls
But! What if other stuff happened in 2005 across the country? Some federal program, or maybe a change in the test. What if what you’re seeing in your graph is just some overall time trend, not actually anything about PreK? To get around that concern, you might imagine including control states. Pick a state in the same region of the country which did not have this universal PreK policy and add their test scores to the graph.
I’ve put an example below. If the control state looks like the solid line, that’s good. If it looks like the dotted line, you are likely to be worried because that makes it look like the change in test scores are there in the control state even without the change in PreK.
Getting Really Fancy
There’s more! Imagine now that you know the first year of universal PreK in all 15 of your states but they’re not all the same. It would be great to be able to look at them all together on one graph. How do you do that?
Well, think about taking the first year of PreK in each state and calling that year “Year 0”. Then, “Year 1” is the first year after, Year 2 is the second year after and so on. And then “Year -1” is the year before, “Year -2” is two years before and so on. Now even though for some states “Year 0” is 2005, and for others it’s 2007, you can put them all on the same graph. The horizontal axis isn’t year, though, it’s now “Time from Policy”. Check out the example below (I even added fake standard errors). This graph could be a pretty compelling demonstration of the effect of PreK policies.
You can get even a bit fancier. Assuming your PreK programs kick in at different times in different states, you can adjust for calendar time. And you can incorporate the no-PreK states by (for example) assigning them Year 0 dates at random and including them on the graph.
The Pitfall: Endogenous Policy Change
The most significant issue with this approach (other than getting data, and actually having policy changes that you can use) is the fact that policy changes usually are not random. State leaders do not wake up one morning and decide on a whim to have universal PreK programs. Virtually any large-scale policy introduction is a result of complicated decision-making by policy-makers, various interest groups and so on. As a result, we may be concerned that the factors which lead to this policy change are also leading to test score changes.
For example: if a state is trending more democratic and slowly adopting more progressive policies of various types, then what appear to be changes as a result of PreK may actually be all the other stuff which is also changing.
This is a hard problem to fix, but one simple approach is to focus in on whether you’re seeing effects before the policy changes. To be more concrete: look at the two graphs below. In the first one, the change in the trend in test scores happens pretty much right when the policy changes. In the second, the trend starts a couple of years before. The first one should make you much more convinced it is this policy, and not other stuff that was already going on.
None of these approaches are perfect; they aren’t going to live up to the randomized controlled trial standard. But what I think is important to note is that with careful use of data like this — but only with careful use — we can generate compelling evidence.
Back to COVID
You either have seen or will be seeing a lot of this type of analysis in COVID. Take masks. Do mask mandates prevent COVID cases? One way to analyze this is to compare states with mandates to those without. But that’s not great! There are a lot of other things going on.
Much better is to look at them over time, to ask whether case rates decrease (or grow more slowly) after mandates than before. That’s what this CDC paper does, and they have a graph pretty much like the one I made above (which I included below). This graph is pretty compelling! If you squint, there’s a bit of a pre-trend, but it looks pretty good overall.
This will not be the last graph like this you see, mark my words. So, now you’ll know how to read it!