This is not the first time I’ve written about Tylenol in pregnancy. It’s not the second time either! But the topic keeps coming up. The latest iteration was a series of scary Instagram posts, some of which reference a possible class-action lawsuit. Here’s an example of what has filled my DMs.

An instagram post about a lawsuit against Tylenol.

This landscape is confusing. On one hand, there are these alarming statements. On the other hand, advisory bodies like the American College of Obstetricians and Gynecologists (ACOG) and the Society for Maternal-Fetal Medicine state that Tylenol is safe during pregnancy. These are bodies — especially ACOG — that are notoriously conservative in their recommendations around pregnancy behavior. Their statements should be reassuring. And yet people wonder at the disconnect. Does this lawsuit know something that ACOG doesn’t?

Today I want to dive into what is going on here. The TL;DR is that the reason for the confusing messaging is that the evidence itself is confusing. While there is a lot of correlational evidence linking Tylenol during pregnancy with autism or ADHD diagnoses in children, this evidence is largely bad, in the sense of leaving us skeptical about causality.

The bulk of this post is going to try to unpack one study for you, as a representative case, to give a sense of what’s hard about this. But first I’ll talk about why the existence of a lawsuit is not, itself, evidence.

One important note: The evidence that suggests risks to Tylenol focuses largely on more extensive exposure — say, taking it for more than 28 days during pregnancy. There is no credible evidence, even correlational, to suggest that taking it occasionally for a fever or headache would be an issue. 

The lawsuits

There are a number of class-action lawsuits under consideration, filed against organizations that either manufacture or distribute acetaminophen (this would include those that make Tylenol and also places like CVS and Walmart that sell it). There is some more detail about the lawsuits here. They are at an early phase.

The argument in these lawsuits — to oversimplify — will be that there are risks to acetaminophen in pregnancy, these were known, and these companies did not warn consumers about them. The evidence in favor of harm is the evidence I’ll discuss below; there is no secret additional evidence that has come to light. One challenge in this lawsuit will be proving that the data shows evidence of harm.

Even if that is a long shot, though, it may be worth it because the possible rewards are so large. If a court agreed that there were risks to these products, then damages could be owed to every family in which the pregnant person took Tylenol and a child was diagnosed with ADHD or autism. This is a huge number of people. To be clear: even the most aggressive read of the evidence would say that any impacts of Tylenol on these diagnoses are very small. However, in a lawsuit, anyone who was in this overall class could get damages, even if the chance that they were personally harmed is vanishingly small.

All of this is to say that there are strong incentives to pursue the lawsuit even if the evidence is not very compelling. The existence of the lawsuit should not, per se, make us more likely to believe there is a link in the data.

The evidence

All of the evidence on acetaminophen in pregnancy is based on observational studies. Broadly, these studies collect data on women’s consumption of acetaminophen during pregnancy (did they take it? How often?) and match it to some data on their children. Most commonly, these data include whether the child was diagnosed with autism or ADHD, although there are other neurodevelopmental measures as well.

The authors then look at whether the children of mothers who took the medications during pregnancy are more likely to have these diagnoses.

As with any observational approach to data, this study methodology raises the concern that it is not the Tylenol that is driving any difference observed but other differences across the groups. In this particular case, these possible other differences fall into two categories. One concern is that there may be baseline differences across women — differences in age, education, smoking behavior, etc. A second concern is that there are differences that are driving their consumption of Tylenol. That is: it may be extremely difficult to separate the impact of Tylenol from the impact of whatever it is the person was taking Tylenol for.

To dive into how this manifests, and what you might do about it, I want to talk about one exemplar study, in Norway. This study is reflective of much of the rest of the data, although I would say it’s among the best of the evidence in terms of the sophistication of the methods and the quality of the data.

The Norwegian study

The study is here. This particular study looks not at autism or ADHD but at neurodevelopment at 18 months, using a number of metrics.

The authors start with a sample of about 100,000 women. Data from these women is collected as part of the Norwegian Mother and Child Cohort Study. Women in the study were surveyed about various aspects of their pregnancy. This included questions about illness during pregnancy and medications taken.

For about half of the women, researchers could follow their children (51,200 of them) at 18 months (the other half were either lost to follow-up or hadn’t been surveyed yet). At 18 months, the children were assessed with several standard tests to measure gross motor skills, fine motor skills, behavior, and temperament. The researchers also collected data on when children learned to walk.

The primary analyses in the paper focus on comparing children of women who did not take acetaminophen (the active ingredient is known in Europe as paracetamol, but I’m going to stick to calling it acetaminophen for consistency) with those who took it for at least 28 days during pregnancy. In an appendix, the authors show that there are no differences for children of mothers who take acetaminophen for less time.

In their first table, the authors show characteristics of the mothers in the no-exposure and the greater-than-28-day exposure groups. They are really different. I’ve captured a few of these differences in the table below.

The most striking differences here — the largest ones — are in variables that seem likely to be reflecting the reason to use a painkiller. For example, 80% of the exposed group report headaches or migraines, versus only 21% of the unexposed group. The exposed group is also much more likely to take other medications, including opioids.

These very significant differences pose an issue for the study, and the authors face it by using an approach called “propensity score matching.” (Teaching moment! Okay, Emily, stay cool.)

Here’s the idea. We’re worried that the people who take medication are different from those who do not, on a bunch of variables. However: imagine for each person with medication exposure, I could find another person without exposure who was exactly the same on all the variables. That is: I find someone who didn’t take acetaminophen but also had headaches, was a smoker, was young, etc. Then I could compare the outcomes for the matched people in the two groups — effectively, looking at impacts while holding constant the observed differences.

This is a procedure called “matching” or sometimes “nearest neighbor matching.” A significant issue with this procedure is that it is often very hard to find a perfect match. Maybe I find someone in the control group who matches on most of the dimensions, but not all. Potentially, I end up throwing away many people who weren’t matched.

Propensity score matching is an approach that delivers (in theoretical results) the same benefits of this nearest neighbor matching, but with fewer restrictions. The underlying idea is that, from the standpoint of causality, I do not actually care that I’m comparing two people whose characteristics are identical. What I care about is that their likelihood of taking the medication is identical. That is: I want to compare two people who seem, based on their observable characteristics, to be equally likely to take the medication. But one of them does and one of them doesn’t. The reason this is less restrictive than the matching described above is that you and I may be equally likely to take the medication, but for different reasons: maybe my chance of taking it is elevated due to headaches, and yours due to back pain. This means we can match people who are not identical on all variables, as long as they are identical on their overall predicted risk.

(There is a lot of interesting theoretical scholarship about this kind of matching; I’m leaving that aside here.)

When the authors do this matching, they extract from the control group a sample who are much more similar to the sample of exposed women. I’ve re-created the table from above below, with the matched group. You can see that, effectively, this procedure has pulled out women from the control group who are much more like the exposed group.

The primary analysis in the paper then compares these two groups.

The authors compare the two groups on 10 milestones. In most of the cases, the risk for the exposed children is slightly higher. It is significantly different in one case (a delayed age of walking). The authors argue that there is also evidence of communication problems, although that result is not statistically significant (it is close). The other outcomes generally point to elevated risks but are not able to reject the null hypothesis that there is no impact of exposure.

What do we take from this? The most direct conclusion — the one the authors argue for in the abstract — is that long-term acetaminophen exposure in pregnancy is a possible risk for delayed walking and communication issues, and caution is warranted.

However: there are a number of remaining concerns that make this conclusion questionable. For one thing, the results are extremely weak statistically — one of them is not significant at conventional levels, and the one outcome that is significant is one that wouldn’t typically be included as part of this type of testing battery.

A more pernicious issue is that this approach only allows researchers to control for differences across groups that are observed. Effectively, in a perfect version of the matching approach, we’d be comparing (for example) two women, both age 34, both with a high school education, and both who have serious migraines. One of them takes the medication and one doesn’t. The theory requires that this choice — conditional on these variables — is unrelated to other important differences across women. But that sometimes seems like a stretch. For example, maybe one person has a better family support system and is able to deal with migraines by taking more naps. But if you acknowledge that, you start to wonder whether it’s the family support that matters rather than the Tylenol.

There is an underlying argument: no matter how good your controls are in an observational study, they are still only controls for the things you can observe. And frankly, that’s a problem.

Back to the bigger picture

This study from Norway is only one example, but there are many studies like it, all of which have similar findings and similar (or worse) methodological problems. Their methods vary. There are studies like this one that actually measure the concentration of acetaminophen in cord blood, and use that as their exposure measure. That seems very fancy and scientific, but it’s subject to most of the same problems as the study discussed above. They are just measuring exposure differently, not fixing the confounding problems.

What I think is compelling and concerning to people is the consistency with which these findings appear. When I posted on the topic a year ago, I discussed this meta-analysis, which combined data from six European cohort studies, all of which were looking at the relationship between acetaminophen exposure and ADHD. They all found a positive relationship. Intuitively we find this convincing, because if there was a relationship, we’d expect to see it in all of the different data sources. But worth remembering: if all of the studies are subject to the same methodological concerns, we could see consistent results because they are all wrong in the same way.

That big study also raises, to my mind, questions that remain unanswered. The table below shows the exposure shares and ADHD shares across cohorts. While within each cohort there is a relationship, if we look across cohorts, there doesn’t seem to be a link. At best, we can say there must be other factors that are driving these differences.

In conclusion

I wish there were a clear way to write this conclusion. I do not think it is right to say that there is nothing in the data that would raise concerns for anyone about long-term acetaminophen exposure. I also think that the quality of the data is largely very poor. Frustratingly, it’s difficult to see how we could improve it; randomizing pregnant women in a study like this would generate significant ethical concerns.

I do think we can end with two important points.

First, to reiterate what I said at the top: the meaningful concerns raised here are with long-term exposure during pregnancy, not occasional exposure. If you have a fever during pregnancy, you should take Tylenol, both because it will make you feel better and because of concerns about fever in pregnancy (although these are also overstated).

Second: people take Tylenol for a reason. For many people, the choice may be between debilitating weekly migraines and regular Tylenol usage. The impacts these studies suggest are very small, even if we take them at face value. In making this decision, we should weigh the real, known benefit against the suggestion of this possible risk. Perhaps not everyone will come out at the same place on this, but it is crucial we give people the tools to make the choice for themselves.