In the past few months, especially as school has started, there has been a lot of discussion of pandemic student test score losses. The latest iteration of this is a report released by the U.S. Department of Education several weeks ago, showing declines that erased two decades of test score growth.
Approaching this issue, it is clear there are many open questions. Exactly how large are the declines? Why exactly did they happen? What can help? Will they get better on their own, and how fast?
In today’s post I want to unpack a bit of what we already know about the answers to these questions from various data sources, both about test score declines and recovery. This post veers a little bit technical, but I think it is worth it to develop an understanding of these data, as we will continue to get information from them over time.
The TL;DR here is that we do not yet have good answers to every question we’d want to ask, and it will probably be decades of research before we do. Even with what we do have, there is confusion. Not every set of test scores measures the same thing, or the same people. When we confront questions like the role of remote learning in test score losses, it’s not always obvious how to separate remote schooling from other pandemic issues.
It is important to note at the outset that test scores are only one measure of success at school. They do not measure love of learning, or mental health, or kindness, or any of a million other things we care about. It is also the case that any testing is only an imperfect measure of learning, capturing a particular set of skills at a particular moment. However: testing is one consistent measure of learning that we have, and therefore in my view it makes sense to study it.
Sources of U.S. test score data
Let’s start with the basics: Where do we get national data on test score performance for kids? There are really three main sources. This list isn’t comprehensive, and there are other sources of private test score or assessment data. However: these are the main data sources that we can use for large-scale assessment data, to look at big trends and changes.
First, the NAEP (National Assessment of Educational Progress) is a series of assessments run by the Department of Education. These are our most consistent and comprehensive source of nationally representative test score data (they are sometimes called the “Nation’s Report Card”). The NAEP data includes a variety of assessments in various subjects, although reading, math, and science are the main focus.
The NAEP data is given to a sample that is intended to be nationally representative, though it’s not given to all students. In reading, for example, the latest large-scale NAEP assessment in 2019 was given to about 150,000 children in grades 4 and 8 and about 26,000 in grade 12.
In addition to its main assessments, the NAEP runs a “long-term trend” assessment, which allows for consistent comparison of test scores between the early 1970s and the present. This is a smaller data set, and it is this test that is what the program recently released data from. Its most recent report includes 7,400 students age 9 from 410 schools tested in math and reading. Most of the schools sampled in 2022 were also covered in 2020, so the comparisons in that report are based on a constant sample.
Two notes. First, the NAEP is expected to release more detailed data from its main assessments later this winter. Second, NAEP tests were not performed in 2021, so there are no metrics from the 2021 school year.
State test score data
All individual states do student testing at the end of each school year, and produce data on student performance. This type of “high-stakes testing” is required at the state level to get various types of federal funding. It is these tests that are typically used, for example, to define schools as failing, or rank schools or districts. When you hear a statistic like “43% of third graders in Minnesota are performing at or above grade level in reading,” that statistic is coming from state test score data.
Data from individual states are very useful for research because they are comprehensive — in a typical year, effectively all children take these tests — and often publicly available, identified at the district and school level, often broken down by demographic group. One downside of these data is they are not comparable across states, and sometimes states change their assessments over time. The state comparability is important. It is not meaningful to compare the “pass rate” on state tests in Texas to the pass rate in Rhode Island, since the tests are different and are normed differently, so passing means a different thing in each location.
From the pandemic perspective, state test score data is useful because these tests were given in many states at the end of 2021 and then again in 2022. The 2021 testing is more haphazard (not all states did it, and participation in some states was very low), but there is sufficient coverage that in many states it will be possible to look at the recovery from 2021 to 2022.
A final source of data is from the education nonprofit NWEA, which runs assessment testing (called the MAP Suite) for a large number of districts across the country. These tests are used by districts for evaluation and curriculum planning. The NWEA develops technical reports (e.g. this one) using its data, with the goal of helping teachers and administrators identify student needs.
These tests were administered in at least some locations during 2021. They are generally given multiple times a year, providing a good window into growth even within a school year.
Pandemic test score declines
It is clear from all three of these data sources that student test scores declined during the pandemic.
The recently released NAEP data provides the best consistent metric of this, and shows a sharp decline in both reading and math scores. The main graphs are below. The particular numbers — scores — do not have a lot of meaning to most of us, but the magnitude of the drop relative to historical changes is really steep.
Pandemic score declines can also be seen in state data. This is a topic I have worked on directly, using data produced as part of the COVID-19 School Data Hub. Together with several co-authors, we used data from state assessments in 11 states to show a decline of almost 13 percentage points in pass rates in math, and 7 in English Language Arts. It is harder to compare these to historical changes, but the declines are way beyond typical year-to-year variation.
The NWEA data shows similar declines. This paper compares growth in individual student learning from 2017 to 2019 versus 2019 to 2021 and shows clear evidence that growth is much lower during the pandemic.
Perhaps not surprisingly, in all of these data sources there is evidence that the test score declines are larger for students living in poorer school districts and for students of color. The NWEA analysis, which is at the student level, can show that for individual students, those from lower-income families lost more, as did Black and Hispanic students.
Overall, the picture is clear: large test score declines, especially for the most vulnerable student populations.
Impact of remote learning
The changes in test scores appear effectively everywhere, but are not the same size. Not all students lost the same amount, and not all schools and districts lost the same amount. A natural question is what drives this variation. Given the landscape for schools over the 2020-21 school year, perhaps the most salient question has been about the role of hybrid or remote learning. Are losses larger when students did not have access to in-person school?
This is a difficult question to answer well for several reasons. The most significant issue is that it is difficult to separate schooling mode from other aspects of the pandemic. Areas with closed schools also tended to have more stringent lockdowns on other dimensions. Because of the political dimensions of school closures, areas with open schools also had lower vaccination rates and (as a result) higher death rates. Isolating the impact of schools is hard.
There are at least two papers that have tried to isolate the impacts of remote schooling. The first is our paper analyzing state test scores. This paper can be found here (it’s forthcoming in the American Economic Review: Insights). Our primary approach is to compare changes in pass rates on state-level tests within small areas. We focus on within-commuting-zone variation, so our analysis compares the changes in test scores for areas that are close enough that they are considered in the same commuting zone, but had different degrees of open schools.
Our estimates suggest that in-person schooling was very protective against test score declines, with fully remote districts predicted to decline by 13 percentage points more than fully in-person ones. The losses in areas with hybrid schooling were in the middle between in-person and fully remote.
A second paper, using the NWEA data, focuses on the impact of remote schooling on achievement gaps across groups. The researchers find that test scores declined more in areas with less in-person schooling and that racial and income disparities between children widened in those areas.
Remote schooling was not the only factor. Even in districts that were fully in-person for school, test scores declined on average. However, looking at the data it seems clear that the lack of resumption of in-person learning was a significant contributing factor to test score declines.
By the spring of 2022, virtually all school districts had been offering full-time, in-person instruction for the entire school year. In many places, the year was still disrupted by quarantines, but overall it was a first chance for recovery. It is possible to look for this recovery in NWEA data and in state testing data; the national NAEP data sees only the 2022 tests, so it cannot see evidence of recovery from 2021 to 2022.
First, an initial report based on the NWEA data shows some rebounding, although at the end of the 2021-2022 school year, the researchers still observe test scores 5 to 10 percentage points down from baseline in math, and 2 to 4 percentage points in reading.
Second, with recently released data, our team has started to look at the evidence from state tests scores. We are actively producing a series of data briefs. You can see them here, and with a description here. In these briefs, we look at changes in test scores from 2019 to 2021 and 2019 to 2022, broken down by the district-level learning mode. Here’s an example set of graphs, from Virginia.
What we observe, overall, is that 2022 test scores are still substantially lower than in 2019, but there has been recovery. On average, Virginia recovered about 40% of its pandemic-year test score losses. The losses during the pandemic were largest in districts with less in-person schooling, and although those recovered at a similar rate to the other districts, they remain lower.
On average, across the states we have data for, about 37% of test score losses seem to be recovered by the end of 2022. However, this recovery is not consistent across states. The graphs below show the same figures for Mississippi. The declines were not as large (although they were still very sizable), but the recovery is much more dramatic. In ELA, in particular, Mississippi fully recovered to the 2019 levels by the end of 2022.
This variation in recovery provides an opportunity to learn about what works to help kids catch up, a question that the pandemic has made more salient but that has always been present. An early note for me is that other than Mississippi, the other state that has shown full recovery of test scores in reading is Tennessee (not yet in our data briefs, but see coverage here). Both Mississippi and Tennessee have invested heavily in phonics-based reading instruction, so this may be (yet more) evidence of the value of those curriculums.
The past two years have seen enormous test score declines for kids, pretty much no matter how you measure them. These declines were caused, at least in significant part, by school closures. There has been some recovery, but it is variable and incomplete. There is more to be done.