Emily Oster

11 min Read Emily Oster

Emily Oster

Why I Look at Data Differently

A lesson on residual confounding

Emily Oster

11 min Read

A question I get frequently: Why does my analysis often disagree with groups like the American Academy of Pediatrics or other national bodies, or other public health experts, or Andrew Huberman (lately I get that last one a lot)? The particular context is often in observational studies of topics in nutrition or development.

Some examples:

people discussing some data on the screen
Gustavo Fring / Pexels

The analysis of processed food and cancer is emblematic of many of these cases. In that post, I argued that the relationship observed in the data was extremely likely to reflect correlation and not causation. My argument rested on the observation that people who ate differently also differed on many other features.

In response, a reader wrote:

You emphasize causation vs. correlation, and I think you are pointing to potential confounders that could actually be the root cause of the findings. My question is — can’t and don’t study researchers control for that in their analysis? Can’t they look at the link between screen time and academic success while keeping potential confounders equal across the comparison groups? And if so, wouldn’t that help rule out the impact of other factors and strengthen the case that there is a true link?

This is a very good question, and it clarifies for me where many of the disagreements lie.

The questioner essentially notes: the reason we know that the processed food groups differ a lot is that the authors can see the characteristics of individuals. But because they see these characteristics, they can adjust for them (using statistical tools). While it’s true that education levels are higher among those who eat less processed food, by adjusting for education we can come closer to comparing people with the same education level who eat different kinds of food.

However, in typical data you cannot observe and adjust for all differences. You do not see everything about people. Sometimes this is simply because our variables are rough: we see whether someone has a family income above or below the poverty line, but not any more details, and those details are important. There are also characteristics we almost never capture in data, like How much do you like exercise? or How healthy are your partner’s behaviors? or even Where is the closest farmers’ market? 

For both of these reasons, in nearly all examples, we worry about residual confounding. That’s the concern that there are still other important differences across groups that might drive the results. Most papers list this possibility in their “limitations” section.

We all agree that this is a concern. Where we differ is in how much of a limitation we believe it to be. In my view, in these contexts (and in many others), residual confounding is so significant a factor that it is hopeless to try to learn causality from this type of observational data. 

This position drives a lot of my concerns with existing research. Thinking about these issues is a huge part of my research and teaching. So I thought I’d spend a little time today explaining why I hold this position. I’m going to start with theory and then discuss two pieces of evidence.

A quick note: This post focuses on concerns about approaches take non-randomized data and argue for causality based on including observed controls. There are other approaches to non-randomized data (e.g. difference-in-differences, event studies) that have stronger causality claims. See some discussion of those in this older post.

Some theoretical background

Conceptually, the gold standard for causality is a randomized controlled trial. In the canonical version of such a trial, researchers randomly allocate half of their participants to treatment and half to control. They then follow them over time and compare outcomes. The key is that because you randomly choose who is in the treatment group, you expect them, on average, to be the same as the control other than the presence of the treatment. So you can get a causal effect of treatment by comparing the groups.

Randomized trials are great but not always possible. A lot of what is done in public health and economics aims to estimate causal effects without randomized trials. The key to doing this is to isolate a source of randomness in some treatment, even if that randomization is not explicit.

For example: Imagine that you’re interested in the effect of going to a selective high school on college enrollment. One simple thing to do would be to compare the students who went to the selective high school with those who did not. But this would be tricky, because there are so many other differences across the students.

Now imagine that the way that admission to the high school works is based on a test score: if you get a score above some cutoff, you get in, and if you are below, you do not. With that kind of mechanism, we can get closer to causality. Let’s say the cutoff score is 150. You’ve got some students who scored 149 and some who scored 150. The second group gets in, the first doesn’t. But their scores are really similar. It may be reasonable to claim that it is effectively random whether you got 149 or 150 — the difference is so small, it could happen by chance. In that case, you can try to figure out the causal effect of the selective high school by comparing the students just above the cutoff with those just below.

This particular technique is called regression discontinuity; it’s part of a suite of approaches to estimate causal effects that take advantage of these moments of randomness in the world. The moments do not need to be truly random, but they do need to be driving the treatment and not driving the outcome you are interested in.

We can take this lens to the kind of observational data that we often consider. Let’s return to the processed food and cancer example. The approach in that paper was to compare people who ate a lot of processed food with those who ate less. Clearly, in raw terms, this would be unacceptable because there are huge differences across those groups. The authors argue, though, that once they control for those differences, they have mostly addressed this issue.

This argument comes down to: once I control for the variables I see, the choice about processed food is effectively random, or at least unrelated to other aspects of health.

I find this fundamentally unpalatable. Take two people who have the same level of income, the same education, and the same preexisting conditions, and one of them eats a lot of processed food and the other eats a lot of whole grains and fresh vegetables. I contend that those people are still different. That their choice of food isn’t effectively random — it’s related to other things about them, things we cannot see. Adding more and more controls doesn’t necessarily make this problem better. You’re isolating smaller and smaller groups, but still you have to ask why people are making different food choices.

Food is a huge part of our lives, and our choices about it are not especially random. Sure, it may be random whether I have a sandwich or a salad for lunch today, but whether I’m eating a bag of Cheetos or a tomato and avocado on whole-grain toast — that is simply not random and not unrelated to other health choices.

This is where, perhaps, I conceptually differ from others. I have to imagine that researchers doing this work do not hold this view. It must be that they think that once we adjust for the observed controls, the differences across people are random, or at least are unrelated to other elements of their health.

This is a theoretical disagreement. But there are at least two things in data that have really reinforced my view — one from my own research and one example from my books.

Selection on observables: vitamins

Underlying the issue of correlation versus causation are human choices. This is especially true in nutrition. The reason it is hard to learn about causality is that different people make different choices. One of the possible reasons for those different choices is different information, or different processing of information.

A few years ago, I got curious about the role of information — of news — in driving these choices, and I wrote a paper that looked at what happened to health behaviors after changes in health information. I wrote at length about that paper here, but the basic idea was to analyze who adopts new health behaviors when news comes out suggesting those behaviors are good.

The main application is vitamin E. In the early 1990s, a study came out suggesting vitamin E supplements improved health. What happened as a result was that more people took vitamin E. But not just any people. The new adopters were more educated, richer, more likely to exercise, less likely to smoke, more likely to eat vegetables. In turn, over time, as these people started taking the vitamin, vitamin E started to look even better for health.

Over a period of about a decade, vitamin E went from being only mildly associated with lower mortality to being strongly associated with lower mortality. This is not because the impacts of the vitamin changed! It was because the people who took the vitamin changed. And, importantly, these patterns persisted even when I put in controls.

What this says to me is that these biases in our conclusions — and I saw this in vitamins, but also in sugar and fat — are malleable based on the information out there in the world. Once you acknowledge that what is going on here is people are reading news and reacting to it in different ways, it is hard to believe that the limited observable characteristics we can control for are enough.

Evolving coefficients: breastfeeding

The second important data point for me is looking carefully at what happens in many of these situations when we introduce more and better controls.

The link between breastfeeding and IQ is a good example. This is a research space where you can find many, many papers showing a positive correlation. The concern, of course, is that moms who breastfeed tend to be more educated, have higher income, and have access to more resources. These variables are also known to be linked to IQ, so it’s difficult to isolate the impacts of breastfeeding.

What these papers typically do is control for some observable differences. And, like the discussion above, we might think, “Well, isn’t that enough? If we can see these detailed demographics, isn’t that going to address the problem?”

The paper I like the best to illustrate the fact that, no, that doesn’t address the problem is one that used data that — among other things — included sibling pairs. The authors of this paper do four analyses of the relationship between breastfeeding and IQ:

  1. Raw correlation — no adjustment for anything
  2. Regression adjusting for standard demographics (parental education, etc.)
  3. Regression adjusting for standard demographics plus adjusting for mom IQ score
  4. Within-sibling analysis: compare two siblings, one of whom was breastfed and one of whom was not

The graph below shows their results. When they just compare groups — without adjusting for any other differences — there is a large difference in IQ between breastfed and non-breastfed children. When they add in some demographic adjustments, this difference falls but is still statistically significant. This is where most papers stop. But as these authors add their additional controls, eventually they get to an effect of zero. Comparing across siblings, there is no difference at all.

The point of this discussion is not to get in the weeds on breastfeeding (you can read my whole chapter from Cribsheet about it). This is an illustrative example of a general issue: the control sets we typically consider are incomplete. There are a lot of papers that report effectively only the first two bars in the graph above. But those simple observable controls are just not sufficient. The residual confounding is real and it is significant.

(If you want another example, I discuss a very similar kind of issue with studies about screen time.. This problem is everywhere.)

Closing thoughts

The question of whether a controlled effect in observational data is “causal” is inherently unanswerable. We are worried about differences between people that we cannot observe in the data. We can’t see them, so we must speculate about whether they are there. Based on a couple of decades of working intensely on these questions in both my research and my popular writing, I think they are almost always there. I think they are almost always important, and that a huge share of the correlations we see in observational data are not close to causal.

There are two final notes on this.

First: A common approach in these papers is to hedge in the conclusion by saying, “Well, it might not be causal.” I find this hedge problematic. If the relationship between processed food and cancer isn’t causal, why do we care about it? The obvious interpretation of this result is that you should stop eating processed foods. But if the result isn’t causal, that interpretation is wrong. This hedge is a cop-out. And this approach — to bury the hedge in the conclusion — encourages the poorly informed and inflammatory media coverage that often follows.

Second: I recognize that other people may disagree and find these relationships more compelling. I believe we can have productive conversations about that. To my mind, though, these conversations need to be grounded in the theory I started with. That is, if you want to argue that there is a causal relationship between processed food and cancer, you need to be willing to make a case that you’re approximating a randomized trial with your analysis. If we focus our discussion on that claim, it will discipline our disagreements.

And last: Thank you for indulging my love of econometrics today. My dad may be the only person who gets this far, but even so, it was worth it.

The bottom line

  • Observational studies, especially of topics in nutrition or development, often don’t account for the many differences across people.
  • The conclusions of studies like these aren’t always untrue, but they are incomplete.
Community Guidelines
5 Comments
Inline Feedbacks
View all comments
multivariatemama
24 days ago

Echoing all of the comments above, also a huge fan of your econometrics articles and just general passion for data. You really have a knack for making stats sexy.

I wonder what you think observational studies are good for? I remember in a previous article you mentioned they are good for hypothesis generation.

I am a developmental psychology PhD student, and mom of a 2 yr old (with one on the way). Our field has a long history of conducting experimental studies using embarrassingly small samples of largely homogenous populations. (Some) of these studies do allow for causal interpretations, but the extent to which those findings reflect something true about a broader population is susp. A lot of the work I do is observational, with the goal of documenting naturally occurring variation in the population. More specifically, how variation in context (e.g., racial diversity and inequality in a neighborhood) predicts aspects of social-cognitive development (e.g., racial attitudes).

The aims of my research is not for parents, but instead intended to document trends that may (possibly in some tiny way) inform policy decisions.

Extending the thread of observational data related to nutrition. Perhaps the conversation about these types of data should not be about individual choice (to eat processed food or not, to take the vitamin or not) but instead about how to support access to nutrition for everyone.

I suppose my question is, if there is no real “downside” to a treatment (e.g., increasing access to nutritionally dense foods) do you feel (converging) observational data can help us inform broad policy decisions? Or do you feel that even this is dangerous?

Amanda
26 days ago

I totally agree on your general points here. It’s part of why I love reading your classic takedowns of panic headlines.

I think one question that needs to be addressed for the next step is often, what do I do now knowing that the panic headline is unfounded, and that we may never have the true randomized study we’d need to make conclusions?

I think it depends on the kinds of stakes involved. I felt more confident using some formula for my babies after reading your work. Most babies on formula do fine, and there is no theoretical reason to think they would not do fine. Is the risk that they might theoretically lose a few IQ points? That’s not supported, and it’s not compelling.

But when it comes to screen time or processed food, or perhaps I should say excessive or regular use of these, the stakes seem different to me. I can’t *prove* that these things are bad or cause certain outcomes. Likely nobody ever convincingly will. We’re not going to do randomized studies on these things. But there are real risks (to teen mental health, or to metabolic health). There are LOTS of people out there with mental health struggles or metabolic health problems. Is it the screens? Is it the processed foods? Maybe, maybe not, but the risks are real enough to me that I think caution is appropriate. Absence of evidence is not evidence of absence.

By contrast, there don’t seem to be a whole lot of toddlers out there with significant problems we could ever plausibly link to formula.

By contrast,

KatS
KatS
26 days ago
Reply to  Amanda
26 days ago

I love this statement: Absence of evidence is not evidence of absence. Often we see on here “there is no data to support XYZ” which is NOT the same as “the data shows that XYZ is untrue”

Usually in these Panic Headline analyses, the conclusion is that it’s just correlation or a bad study and we shouldn’t worry. But the truth is that cancer, metabolic issues, and mental health issues are all increasing and must be caused by something (or several things). What do you believe is behind these increases? It would be a helpful addition to these articles for you to tell us what you think SHOULD be studied and how the experiments should be setup so that we can actually get some reliable answers.

jfc.combe
jfc.combe
1 month ago

Love this article!!!

Jennifer H
26 days ago
Reply to  jfc.combe
26 days ago

Same! It’s not just for your dad 🙂

Dec. 5, 2022

12 min Read

Where Does Data Come From?

Weight and weighting

A child holds up an abacus with green and red beads arranged to look like a data chart.

Aug. 10, 2023

9 min Read

Data Literacy for Parenting

The ParentData mission statement (it’s on my door!) is “To create the most data-literate generation of parents.” The other day, Read more

An illustration of a head, with the top opening up to reveal a rainbow of colors against a blue background.

Oct. 10, 2023

10 min Read

I Hit My Head and Learned Three Lessons

At the end of September, I went to a conference in Denver. The first morning, I went for a run Read more