Emily Oster

8 min Read Emily Oster

Emily Oster

Hot Dogs, Pregnancy, and Empirical Methods

A review of the many problems with case-control studies

Emily Oster

8 min Read

Last week on Instagram, an account with over a million followers shared a “fact”: eating hot dogs in pregnancy increases child brain cancer risk by 33%. The claim was that this was supported “by studies” and, indeed, I was able to find the study they referenced. I posted immediately to dispel this claim, and to discuss the limitations of the study referenced.

The main issue is that the evidence relies on a case-control study design, which is often deeply problematic. Despite this, many of the scarier claims we see as parents are based on this type of study. Today I want to go into what this type of study does, what some of the pitfalls are, and when to be concerned.

If you want the 90-second version about the hot dogs, just watch the reel. If you want the data dive, keep reading.

What is a case-control study (and why would one do it)?

It’s easiest to explain this by starting with a more familiar study design. Consider the kinds of studies that link (for example) vegetable consumption to high blood pressure. Studies like this typically make use of large observational data sets. They take a survey like the CDC’s National Health and Nutrition Examination Survey (NHANES), which asks people about their diet and also measures blood pressure. Researchers observe who eats more vegetables, and they ask whether that correlates with high blood pressure readings.

(Below I’ll get into the issues with this approach, so hang on if you’re ready to complain.)

This study design is appropriate when you are studying an outcome, like high blood pressure, that is common. This aspect is important because a data set like the NHANES isn’t infinitely large — they have about 11,000 participants a year. That is plenty if you are studying high blood pressure, though, because it’s a common issue; in a sample that size, you’ll have a lot of people with the condition.

This approach does not work well, though, when your outcome is rare. Imagine I want to study a condition that appears in 1 of 10,000 people. In the NHANES, I’d expect one person (on average) to have the condition in each year. That isn’t enough people to say anything useful about even what is correlated with it.

In studying rare outcomes like this, it is therefore common to use a “case-control” approach. This works by beginning with identifying a set of cases — people who experience the rare outcome. The researchers then identify a set of controls — people who are otherwise similar but didn’t experience this rare condition. You then collect information about behaviors and circumstances for both groups, and try to identify behaviors that are more common in the case group.

To be concrete, I will go through the hot dog and childhood brain cancer example. Childhood brain cancer is extremely rare — the case rate is in the range of 1 to 5 per 100,000 children per year. Even an extremely large survey would be unlikely to have any cases. When studying it, therefore, researchers start by recruiting families with a child brain cancer case. They then identify a sample of (typically) similar-age children who did not have brain cancer. Parents are asked about all kinds of behaviors, and, if you’re interested in pregnancy, women are asked about their behavior during pregnancy (including dietary choices).

The analysis of hot dogs proceeds by comparing the hot-dog-eating behavior among the mothers of the children who had brain cancer with the hot-dog-eating behavior among the mothers of healthy children. If hot dog eating is significantly more common in the mothers of affected children, then this is identified as a potential cause.

One thing to note about case-control studies is that they cannot identify levels of risk. It isn’t possible to use this to say that the risk of something is elevated from (say) 1% to 2%, because we do not have denominators to identify levels. The results are in odds ratios: this behavior increases the risk by 20% or 50% or 100%. Sometimes, those statements can make any results seem a lot scarier than they might be otherwise — since these are rare conditions, a 50% risk may be an extremely small change in absolute terms.

This approach is commonly used to study rare outcomes in many areas, including pediatrics. Studies of SIDS, stillbirth, and childhood cancer all commonly rely on this approach. This is by necessity, and these studies can be done well. But they are also subject to significant concerns.

What’s the problem?

Again, let’s step back to our more standard example of vegetable consumption and high blood pressure. Imagine that researchers using the NHANES found a link between the two. As I have noted many times before, a primary issue with this type of study is that the people who eat more vegetables also behave differently in other ways, so it’s hard to know if the vegetables are the key behavior.

This correlation-not-causation concern is important. But what is good about this study design is that everyone in the study is drawn from the same sample. That is: when researchers go out to recruit participants for the NHANES, they do so with no knowledge about their blood pressure or vegetable habits. In many cases with large surveys, the intention is to be representative of some broader population. This means that while we worry about the interrelationship between various behaviors, we do not worry that the people with high blood pressure are drawn from a completely different population.

In case-control studies, we add this layer of concern. The design of the study often results in the samples being drawn from different populations. To consider the brain cancer case: it is common to identify cancer patients by working with a hospital; researchers would attempt to enroll (say) all the pediatric brain cancer cases at a hospital or hospital system.

The control group is recruited from the general population. Herein lies a problem. When you recruit people for a study from the general population, you typically get a fairly selected sample. People who are in the case group — with a sick child — usually have high participation because they hope to get some answers. People who are unaffected but volunteer to participate tend to be very different — in some ways that you can see and some that are harder to see. When you compare behaviors between cases and controls, you now worry about differences in the sampling that could drive your results.

In the hot-dogs-and-brain-cancer data: in the largest study, the control population is on average much better educated than the case group. This is almost certainly due to recruitment — when you go to recruit people to be in studies, better-educated people are more likely to volunteer. What the authors find is that the control group is less likely to do things like eat hot dogs, and more likely to take vitamins. One possibility is that these behaviors are associated with pediatric brain cancer. But another, in my view much more plausible, explanation is that these behaviors are associated with education, and the difference in the sample is explaining what we see.

These issues arise here even though parental education and childhood brain cancer have no clear reason to be linked. It’s not that there are other behaviors we need to control for; it’s that the entire exercise is flawed based on participant selection.

It is possible to do a study like this in a way that is more believable. In Cribsheet I talk about one study of the link between breastfeeding and SIDS deaths that I think is compelling. In this study, the “cases” are infants who died of SIDS. The study takes place in the U.K., where home-visiting nurses visit parents post-birth. The authors use as controls the infants who are in the same nurse home-visiting rotation. In this way, they are really pulling from the same sample. To be clear: this study still has all the problems of correlation and causation, but it does not also have the differential sample problem (the authors do not find links between breastfeeding and SIDS in this study).

Case-control studies have a number of other potential problems. One issue is the difficulty for people of recalling behaviors that occurred in the past (i.e. how often you ate hot dogs while pregnant seven years ago) and the possibility that this recall may be influenced by your group. If you think that hot dogs cause brain cancer, maybe you over-remember your consumption. It’s hard to evaluate these issues, and they vary across designs.

Bottom line: Sometimes case-control studies are the only option, but they should be interpreted with extreme caution.

Second bottom line: It is fine to eat hot dogs in pregnancy.

Community Guidelines
A high-angle view of popcorn spilled across a orange-red background.

Sep. 30, 2022

2 min Read

Can I Eat During Labor?

Is it true I can’t eat during an epidural? —I’m going to be hungry! It’s not the epidural — it’s Read more

A kid in a red shirt smiles while eating a red popsicle that has dripped all over their face and arm.

Oct. 30, 2024

6 min Read

Is Red Food Dye Dangerous?

There are a few topics that come up in panic headlines all the time: “forever chemicals,” “EMF radiation,” microplastics. And today’s, Read more

Fruits and vegetables are on display in a grocery store, including fennel, greens, onions, carrots, broccoli, and more.

Mar. 17, 2023

3 min Read

Should I Feed My Baby Organic Food?

Should I be feeding my kid organic food (vs. non-organic food)? The organic-vs.-non-organic debate has been around for a while, Read more

A close up of a person looking down at a bottle of olive oil while cooking.

Aug. 28, 2023

8 min Read

Can You Really Make Your Baby Smarter By Eating a Mediterranean Diet?

I talk a lot here about “panic headlines” — basically, headlines that seem designed to … make you panic. Things Read more