Pre-kindergarten programs are of a lot of interest to readers here, both from a policy and a personal standpoint. Readers want to know how important it is for their child to be in a “good” pre-K program (whatever that means), and it’s hard to avoid the national or local policy conversation about introducing universal pre-K.

It is a generally accepted view that access to pre-K programs is good for children, which is why when a study came out a couple of weeks ago showing worse outcomes for children enrolled in a particular pre-K program in Tennessee, people kind of panicked. I can usually tell from the number of panicked emails and Instagram DMs how urgent it is to discuss a particular result. This didn’t quite reach the “toxic baby metals” level, but it was getting there.

Before getting into the study, it is important to set the stage. This is just one piece of a broad academic literature on this topic. There is a tendency sometimes to cover research like this as if it is either the first information we have about the question or as if it is inherently better because it’s new. Neither is necessarily true, so it’s always useful to step back and ask what the context is.

Overall, the academic literature tends to find positive impacts with pre-K programs. This goes back to well-known early evaluations of the Perry Preschool and Abecedarian Project that showed gains to children from early childhood education. The literature has evolved from those early studies (see a long review here) to support the value of Head Start and Head Start–like programs. Studies — like this one, of universal pre-K in Boston — have shown long-term impacts on high school graduation, college-going, and SAT test-taking.

This general consensus isn’t to say that every paper finds the same thing or that every result is positive. The most consistent finding is that pre-K raises kindergarten readiness. Effects on later test scores tend to be lower or zero. And yet in some cases, like the Boston data, there seem to be these positive impacts on educational outcomes even without corresponding impacts on test scores. In summary: the landscape of the literature on pre-K programs is broadly supportive, but it’s not always a straight-line analysis. (By the way, if you are looking for a wonkier run through the literature, Noah Smith also wrote on the topic this week.)

With this background, let’s turn now to the new paper.

What does the paper do and find?

This paper is the latest report out of a larger project that has been evaluating the state-run voluntary pre-K program in Tennessee. This program is fairly typical of state-run pre-K programs: It serves a lower-income population with a voluntary one-year pre-K program. The program provides some quality control to make sure participating centers adhere to a set of standards.

This new paper focuses on the impacts of the program on children who entered in 2009 and 2010; earlier work as part of this project looked at the impacts of the program on kindergarten readiness and on test scores in the third grade. The current paper follows the students through sixth grade and evaluates the impact of the program on test scores, disciplinary infractions, and attendance, among other outcomes.

In broad terms, the primary challenge to evaluating the value in a program like this is that interest in the program isn’t likely to be random. Parents who want to enroll their children in these programs are different from those who do not, and it’s difficult to separate those differences from the program impacts. This project uses a common approach to addressing this problem: randomization among oversubscribed programs. More parents wanted to enroll their children in the program than there were slots available. The limited slots were allocated by random lottery. The evaluation of the program therefore relies on comparing the lottery winners with the lottery losers.

Methodology break!

In a study like this, a key issue in looking at the data is how to deal with the fact that not everyone adheres to their lottery assignment. If you think about the canonical randomized trial — say, of vaccines — issues of non-adherence are limited. People enroll in the trial and they get a vaccine, but they don’t know (nor do their doctors) which vaccine they are getting. It’s not like you’re put in the trial, told you’re getting the placebo, and then allowed to choose another option.

Here, the reality is closer to that. In this study, 1,852 children were assigned to treatment (i.e. admitted to the pre-K program) and 1,138 were assigned to the control group. However: of those 1,852 treatment children, only 1,608 attended. Moreover, of the 1,138 control children, 389 of them ended up attending even though they were assigned to control. These changes happen because some families decide it doesn’t work for them even if they have access, and others lobby to get in (perhaps to take the place of those who opt out).

Given this phenomenon, there are two ways to analyze the data. One is to use “intent to treat,” or ITT. This method would compare the lottery winners with the lottery losers, regardless of what they ultimately decided to do. A second way to analyze the data is to use “treatment on the treated,” or TOT, which would compare all children who attended pre-K with those who did not, regardless of their lottery outcomes. The authors in this case show both results.

In general, the ITT results should be the focus, since these are the results that are driven by the random assignment. It may seem more sensible to look at comparisons between kids who actually attended pre-K and those who did not, but because of the non-adherence issues, that comparison doesn’t have an obvious causal interpretation. The choice of attendance isn’t random. Only the lottery assignment is random. So when I talk about results below, I’m going to have in mind the ITT ones.

End of methodology break!

The project, in the end, is able to follow about 2,400 students through the sixth grade. Earlier work on this project showed that children enrolled in the program had higher achievement scores at the end of pre-K. However, this work also showed the surprising result that third-grade test scores were lower in the students who attended the pre-K program. Effectively, this paper is a follow-up to see if that pattern persisted.

It did.

The authors find that students who were randomized into the pre-K program had lower reading, math, and science scores in sixth grade. They were more likely to have been held back, more likely to have an Individualized Educational Plan (a marker for engagement with Special Education), and had more disciplinary infractions. These effects are statistically significant, and largely consistent across the many, many different ways the authors analyze the data.

The size of the effects are small to moderate. The test-score changes, for example, are in the range of 0.1 to 0.2 standard deviations. That’s a large enough effect to pay attention to, but not enormous.

So: that’s the paper. It has a reasonably strong causal argument and shows that this pre-K program seemed to worsen child outcomes later.

What do we make of this?

What is striking about this paper is that it stands in contrast to much of the literature I discussed at the start. Existing work — like that study in Boston — doesn’t necessarily show long-term positive test-score impacts, but it also doesn’t show these kinds of declines. And other papers find improvements in behavior, not worsening. This shows the opposite, and, to be honest, it isn’t clear why.

One option is that the other studies were wrong and this study is correct. That, in fact, pre-K programs are, at least in some ways, detrimental to child development.

A second option is that this study is wrong. The results could have occurred just by chance, or maybe some aspect of the recruitment caused errors. Certainly in any study we look at, a skeptical person could find things to complain about.

In my view, however, both of these arguments are too simplistic. This study isn’t perfect and neither are those that came before, but it does have merit, and so do the results that pre-date it. My guess is that all of these results are “correct” in their own context. The challenge is in how we think about bringing them together.

There are at least two big pieces to that. The first is thinking about the timing of the outcomes that are considered. Nearly all of these papers show that access to pre-K improves readiness for kindergarten. They also nearly all show that impacts on test scores fade over the early school years. However: the study in Boston, along with evidence from other contexts (like this paper on the value of kindergarten teachers), suggests that there may be delayed effects that show up only in high school and beyond. It’s possible that a lot of the evidence we observe could be organized, or at least organized better, by thinking clearly about the timing.

The second big piece — and, in the end, probably the most obvious explanation for why these results differ from others — is that the impact of pre-K programs varies. This variation could be across time, location, or characteristics of children. The program in Boston reached the overall population rather than being targeted to a very low-income group. And it was less recent: the data covered 1997 through 2003, rather than 2009-2010.

An important consideration may be the outside option for children in these different locations or time periods. Others have pointed out that pre-K programs in early years may have shown large effects precisely because the alternative programs were less sophisticated. In a world where we understand better the value of pre-K, the programs that parents enroll their children in even if they lose the lottery may be better. This would lead us to estimate smaller positive effects in the pre-K programs we’re evaluating.

A version of this critique shows up in studying charter schools. When you look at data on the impact of charter schools on children’s outcomes, you will find that the value of the schools in terms of increasing test scores is greater in areas where the alternative school performs worse on these tests. In areas with high-performing district public schools, the impacts of charter schools are sometimes negative. A similar thing could be true here: the impact of introducing a pre-K program broadly may depend on the alternatives families have.

This is all a bit in the weeds, and, ultimately, it is probably the job of the literature to try to sort out how we can take all the results and organize them more coherently. The question of how we think about universal pre-K programs is an important one. The policy conversation here has stalled waiting for Build Back Better, but at more local levels these programs are expanding, and I suspect in the long run this will return to the federal policy priority list. As these policy conversations happen, it will be crucial to try to understand the literature overall.

However — and this is perhaps a key point for the panicked email senders — this particular study is not very useful in individual decision-making. Given how much variation there seems to be in the impact of pre-K across programs, children, and locations, it would be a big mistake to take the results from this study and apply them in an uncritical way to your own choices. At best, we learn from it that not all pre-K programs yield positive results on all dimensions. So there’s value in thinking carefully about the program you choose. But that value was there before, and it would be there even if this particular study had shown something different.