I talk a lot here about “panic headlines” — basically, headlines that seem designed to … make you panic. Things like “Even one bag of Doritos causes cancer” or the classic “Kids who drink formula do worse on the SATs.” Within the past week, we had two of them.
First, there was an article published in JAMA Pediatrics about screen time and test scores. The article got wide publication in the press, as this topic usually does, with headlines like “Screen time linked with developmental delays, study finds.” The paper was a standard observational study. Researchers in Japan asked parents how much screen time their children had around the age of 1 and then followed them until they were 4, performing tests of communication and problem-solving at ages 2 and 4.
The researchers classified children into groups based on their screen-time exposure: less than 1 hour a day, 1 to 2 hours, 2 to 4 hours, or more than 4 hours a day. When they compared children in communication skills at age 2, they found that kids who were exposed to more screen time had lower scores. Notably, these differences did not persist to age 4.
This paper has a very common correlation-is-not-causality problem. Families where kids were exposed to more screen time were also different in other notable ways, including household income, maternal education, presence of a grandparent, maternal age, and number of siblings. All of these differences are important predictors of communication. Although the authors are able to control for some of them, we should continue to worry that there are significant other unobserved differences. I talk about that much more in this post on why I look at data differently.
This is an easy one to dismiss. Correlation is not causation, and it’s hard to convince yourself that the choice of an hour a day versus four hours a day of television for a 1-year-old is a random choice. To be clear: four hours of screens a day for a 1-year-old will leave relatively little time for other things, so it is worth being intentional about how you use screen time. But this study shouldn’t change how you think about that.
The second paper was a little more interesting, and I want to get into it today. That paper is on the relationship between the Mediterranean diet in pregnancy and child development, and it makes the claim that a mother’s eating the Mediterranean diet during pregnancy improves the child’s development at age 2. What makes this paper more interesting is that it’s based on a randomized trial. The analysis compares children of women who were randomized into a Mediterranean diet during pregnancy with those who were not. Randomization in this way means that, on average, the two groups of families are similar. This approach should avoid the correlation-versus-causation issue that the screen time paper has.
So, is it true then? This is where it gets fun. The answer is no, it’s not likely true. The interesting part is how one can figure that out. Let’s dive in.
Study overview
To set the stage, let me give a slightly more complete overview of the study.
This study recruited 1,221 pregnant women mid-pregnancy (between 19 and 23 weeks) and randomized them into three groups. “Usual care” (the control group), a Mediterranean-diet encouragement group, and a mindfulness/stress-reduction group. For the purposes of this discussion, I’m going to focus on the Mediterranean diet group versus the control group.
The Mediterranean diet intervention involved nutrition counseling and providing participants with free extra-virgin olive oil and walnuts. They were given shopping lists and meal plans. The participants in the Mediterranean diet group ended up with a diet that scored higher on adherence to a Mediterranean diet pattern.
At the children’s age of 2, 626 of the families agreed to have them tested using a standard test of child development (the Bayley-III scale).
The headline finding of the paper is that the children of the mothers randomized into the Mediterranean diet group performed 5 points higher on this development scale than those in the usual-care group. This difference is statistically significant.
Bayesian skepticism
Most of this post is about what is going on in the data, behind the scenes. But I want to start with why I was very, very skeptical of this result.
The development scale used here is intended to be comparable to a standard IQ scale. The “standard deviation” of the measure used is about 15. This means that what this study suggests is that encouraging women to eat differently for a few weeks of their pregnancy increases their child’s measured IQ at age 2 by a third of a standard deviation. This is an extremely — implausibly — large effect.
That view is informed by spending a huge amount of time looking at data like this. There are few interventions, even large ones that focus on encouraging child development directly, that would deliver effects like this.
There is a fine line here. We want to be open to new results and new findings — that’s how science progresses. But we also want to be aware of what we know from past results. This effect is incredibly improbable. It’s part of what should push us to see if, maybe, something else is going on.
Differential selection in follow-up
In the original study, the 1,221 pregnant women were randomly allocated into the groups. This randomness is what guarantees that when we do later comparisons, we can be confident that there are no other differences across groups.
If the researchers were able to measure the development scores for all 1,221 children in the three groups, they could be confident that any differences they saw were attributable to the diet treatment.
That “if” at the start of the last paragraph is the issue. Only half of the children in the study were in the follow up. In principle, that’s not necessarily a problem as long as the children in the follow-up are also random. Where this becomes an issue is if the follow-up is different across groups.
It turns out that’s the problem here. Table 1 in the paper makes this clear. In that table, the authors compare demographics across groups for the kids who are actually involved in the follow-up study. What we see there are large differences in demographics across groups. For example: 72% of the children whose mothers were randomized into the Mediterranean diet group have a mom with a university degree, versus only 64% of the usual-care group; 50% of the kids of the Mediterranean diet group were girls, versus only 43% of the usual-care group. There are also differences in socioeconomic status and child care arrangements at age 2, among other things.
This isn’t a failure of randomization — if we looked at the whole cohort, all of these things would be expected to be balanced. The problem is that there is selection in who was in the follow-up group. For some reason, the people in the Mediterranean diet group who chose to be in the follow-up were more educated, had more resources, and had more girl children. These are all features that contribute to higher test scores at this age. So even if the initial groups were chosen randomly, the people who were in the follow-up were not.
The result of this is that we are back to the same basic issues we have with non-randomized data. Which means we need to go to the appendix tables. In Appendix Table e2, the researchers adjust their results for fetal sex and a single metric of socioeconomic status. When they do even this minimal adjustment, their results go down by a substantial amount. With more adjustments, we’d expect the effects to go down more.
Bottom line: While this study was randomized initially, the analysis isn’t. And it is therefore subject to the same kind of biases as non-randomized studies.
Why did this happen?
Why did this happen? It’s an interesting question, and I wasn’t able to get to the bottom of it based on the published data. The treatment here wasn’t blinded — people knew which group they were in, and so did researchers. It seems possible that this impacted who chose to participate in the follow-up in some way that biased the results.
What’s the lesson here?
First, I am sorry to say, but marginal adherence to a Mediterranean diet during the second half of pregnancy is not going to deliver 5 IQ points to your toddler. It’s still a good diet! Just not for this reason.
Second, I’d come back to the plausibility point above. Just because a result comes from a randomized trial, we should still ask whether it’s reasonable given everything else we know in the world.
Third and perhaps most important: you can never go wrong looking at Appendix Table 2
Community Guidelines
Log in