Over the weekend I organized the refrigerator and cabinets. I even cleaned the produce drawers, revealing I actually have nine heads of garlic lying around. I finally accepted we are not going to use shredded coconut with a 6/20/2017 expiration date and threw it away. It felt really, really good to do this and I think the simple reason is that it gave me a feeling of control. Between COVID and the election and…everything…it’s hard to feel like anything is under our power anymore. It was good to remember that I still have control over whether the butter lives at the top of the fridge (yes) and whether we need to keep Tahini which expired in 2015 (no).

Today, I wanted to talk through a new study that a number of you sent me on alcohol and pregnancy (I said I would do this weeks ago but, well, COVID). It seemed a good opportunity to revisit the particulars of the study, and also to step back and talk a little bit about how I think about the value of any given study, and what it adds to what we already know.

(Let me also say at the outset that this is a careful and thoughtful paper, and I really appreciate the authors responding to a few questions I had. I respect what they’ve done very much.)

The paper is here. It was published in late September in the American Journal of Psychiatry. In very broad strokes, it’s a paper which uses data from the ABCD Study to look at the relationship between prenatal alcohol exposure and a variety of outcomes, including a large number of behavioral and cognitive variables, and a number of measures of brain structure. The authors look at prenatal alcohol exposure in several ways, including a linear control for number of drinks consumed, a binary for alcohol consumption or not, and (most relevant here) grouping people by exposure group. Two of their exposure groups represent light drinkers.

The authors find children of moms who drank alcohol have more behavior problems (they also have higher performance on cognitive tasks and executive function in some analyses, although this is not discussed much). They also find some mixed associations with brain structure, which I’m going to leave aside for now as they are hard to interpret. The negative associations with behavior extend to mothers who report drinking a small amount before knowing they were pregnant and not at all after.

That’s the high level overview. The question is: what to make of it?

I write a lot about this in Expecting Better, including a lot of detail about existing studies on this topic. This paper is not alone in analyzing these questions. Most of the literature I summarize is more reassuring: we know that heavy drinking or binge drinking in pregnancy is dangerous, but many studies of low or moderate prenatal alcohol exposure show no impacts. As I talk about there, there are studies which disagree with this, but they often have very significant biases.

When I write about these studies, or when I read them, I focus on three things. First, the quality of the outcome measures: how well have the authors measured the outcomes we care about? Second, the treatment measures: how carefully and convincingly have they defined drinking behavior? Is it really well measured and capturing what we want to study? And, third, is there a plausible causal interpretation?

This paper, like virtually everything in this literature, is an observational study. They compare women whose behavior differs, but the behavior is not randomized. I wrote a couple of newsletters ago about epidurals and autism, and pointed out the concerns with causality there. This literature is subject to similar concerns: are women really comparable across groups?

How does this paper stack up on these three dimensions?

Outcomes The outcomes in this paper are comprehensive — almost to a fault. There are a huge number of variables considered, virtually every behavior or cognitive measure I could think of, plus all kinds of brain measures. This is a big focus of these data, so it stands to reason they’d do a good job on it.

There is actually a danger with so many outcomes of arriving at false positive conclusions (basically, if you test enough variables, some will be significant). The authors could have done more to adjust for this, but this is a very nerdy statistical point.

Causality The authors of this paper do a lot to try to convince us that their groups are comparable and to adjust for differences across them. Most convincing is a matching analysis, where they literally try to find children with similar demographics but where mothers have different drinking behavior. However, despite this, at the end of the day, it is very hard to be fully convincing here, just given the number of differences across the groups in the raw means in the data. This is especially true in the US — where this study is run — where drinking alcohol in pregnancy is heavily stigmatized. The problems of causality here are much more extreme than, say, the epidural study I talked about recently.

I am given pause by some of the patterns in the analysis, also, which are hard to square with causal interpretation. For example, on a number of metrics the data shows that behavior problems are worse for mothers who drink at low levels throughout pregnancy than for women who drink heavily at the start and then at low levels later. It’s hard to see why this would be true under a causal mechanism. Not impossible, but hard.

I have some other nit-picky comments about controls and interpretations, but I think they’d be unlikely to change the results much. The bottom line is that causality is just really, really hard here. This is not a criticism of the authors. Convincing analyses of topics like this with observational data are extremely challenging.

Treatment Definition The most significant issue with this paper is the definition of drinking behavior. I spend a lot of time on this in Expecting Better, where I focused on papers which collect responses on alcohol consumption during pregnancy and then follow children later. In this study, information on drinking behavior was collected when the child was 9 or 10, at study enrollment.

At this time of enrollment in the study, women were asked about their alcohol consumption during pregnancy. The questions are below.

There are two low alcohol groups in the study. First, “light stable” drinkers reported 1-2 drinks per occasion, fewer than 7 per week throughout pregnancy. This group is small. Second, “light reducers” were women who reported 1-2 drinks per occasion before pregnancy, and then less than 1 drink per occasion after learning they were pregnant. Some of the analysis relies on comparing these “light reducer” women to “abstainers” (those who had less than 1 drink per occasion throughout pregnancy; they may not actually have abstained).

This relies on women correctly recalling their drinking behavior in the weeks prior to pregnancy a decade before. This type of data is not likely to be very reliable. It seems plausible that women might remember something broad about their alcohol consumption during pregnancy, but to specifically remember if the week of your missed period you had two drinks per occasion or less than one seems less plausible.

The authors are very honest about the fact that this is a weak point of the paper, and it’s simply a limitation of these data. The data has a lot of value, notably the excellent outcome measures, but the treatment is a challenge to interpret.

Summary Thoughts

Putting this together, had this paper been available when I wrote (or revised) Expecting Better I might have mentioned it, but it would not have been a central piece of the evidence, given the issues with the treatment measure and the fact that there are other papers with many of the same positive features without this downside.