Prenatal Tests and False Positives

Last week, the New York Times published a fantastic investigative piece on false positives in prenatal testing. One of the authors was the incomparable Sarah Kliff, and it is an absolutely awesome combination of on-the-ground reporting with patients, research into the companies that provide testing, and data visualization.

The conclusion of the piece was in some ways very alarming — these tests were not nearly as protective as they were sold to be.

Many people emailed me to ask about it, and the piece is so good that I wondered if there was really anything for me to add. But then I realized that there were no equations and no mention of Bayes’ rule. So I’m swooping in with more statistical lingo and to explain, just a bit more, why the conclusions shouldn’t have been surprising.

Background on prenatal testing

Before getting into those equations, it’s worth stepping back to give a little background on the focus of the article, which is non-invasive prenatal testing.

When I was pregnant with Penelope, back in 2010, screening for chromosomal abnormalities (e.g. Down syndrome, other trisomies) came in two types. First, an ultrasound screening, which provided some information but with a lot of missed diagnoses. Second, invasive testing (either placental sampling earlier in pregnancy or amniocentesis later), which carried some risk of miscarriage but was more accurate.

By the time I was pregnant with Finn in 2014, a new option was available: non-invasive prenatal testing, using cell-free fetal DNA technology. These tests make use of the fact that some fetal DNA circulates in the maternal bloodstream. Treated correctly, a sample of blood from the pregnant person can be used to detect abnormalities in the fetus. To somewhat simplify, the approach is to look for evidence of DNA that wouldn’t otherwise be in the mother.

For example: let’s say you wanted to know the baby’s sex. The pregnant person typically has two X chromosomes. A male baby will have an XY, and a female will have XX. If you sample mom’s blood and you find evidence of circulating Y chromosomes in the cell-free DNA, this indicates a male fetus (since the mom’s own cell-free DNA wouldn’t contain a Y chromosome).

In their early conception, these tests were used to detect infant sex and the three primary trisomies (Down syndrome, trisomy 13, and trisomy 18). Detecting a trisomy means, effectively, looking for an imbalance in the presence of these chromosomes in the cell-free DNA. Assuming the pregnant person has two copies of each chromosome, if you observe an excess of chromosome 21 in the cell-free DNA, it suggests this must be due to an excess of chromosome 21 in the fetus, which would suggest Down syndrome.

These tests are very accurate for determining sex and Down syndrome in particular. Notably, they are substantially more accurate — both in terms of fewer false positives and false negatives — than the non-invasive ultrasound options that preceded them. However, they are still screening tests and not diagnostic. To be certain about these conditions, it is necessary to follow up a positive test with some invasive testing that is able to sequence fetal DNA.

The tests expanded

As a method for screening for major trisomies, there is some agreement about the value of these NIPT tests, as they are called. However, and this is the topic of the New York Times article, companies have started using these approaches to test for much, much rarer conditions. And therein lies the problem.

The conditions in question are mostly what are known as microdeletions. These are syndromes or disabilities that are a result of a small missing DNA piece in one chromosome. An example is the 22q11.2 deletion — a small missing piece of chromosome 22 that can lead to a developmental disorder called DiGeorge syndrome. This occurs in perhaps 1 in 4,000 births.

There are many kinds of these microdeletions, with varying prevalence, though all are quite rare. The claim, made frequently by NIPT-providing companies, is that the tests can detect microdeletions in the same way they detect Down syndrome or sex chromosomes. In a sense, they can. But in another sense, they are limited.

The Bayes’ analysis

To see the main issue, consider the test from a company called Harmony for this 22q11.2 microdeletion. Harmony provides some details about its test performance in this document.

According to the company’s analyses, the test detected 75% of cases with this deletion, and it saw only a 0.5% false positive rate. That is, of the cases without the microdeletion, only 0.5% of them showed a positive test result. As noted above, other sources put the overall risk of this microdeletion at about 1 in 4,000.

Let’s think about what that means if you do get a positive result.

To be concrete, imagine we have 80,000 people being tested. We expect, based on the underlying risk, that 20 of them are carrying a fetus with this microdeletion. When the 80,000 individuals are NIPT tested, 75% of those 20 cases (or 15, in expectation) will show up as positive tests. In addition, of the 79,980 people being tested who do not have a fetus with a microdeletion, 0.5% of them will get a false positive. That’s around 400 people.

Altogether, there are 415 positive tests: the 15 true positives, and 400 false positives. So if you get a positive test result, the chance that the fetus is actually affected is about 3.6%.

This is precisely the point that is made in the NYT story — that with these tests, which seem so accurate, in the vast majority of cases, even after a positive test, the fetus is in fact not affected.

The whole calculation is a straightforward application of Bayes’ rule, which I did a longer discussion of here. Intuitively, though, it somehow feels wrong. On its face, this sounds like a really good test! It detects 75% of cases, with only a 0.5% false positive rate. That seems like it should be helpful. And the fact is that it is really helpful, and it is hugely informative. Before the test, the risk was 1 in 4,000. After a positive test, it’s 4 in 100. This risk is a different order of magnitude — you’ve learned so much. You just haven’t learned everything.

The reason you can simultaneously have an excellent test and still this residual uncertainty is that the condition is very rare. This means that even a small false positive rate is a large number of false positive cases. The lower the baseline risk, the more significant this issue is.

Where’s the fire?

All of this is clear from an analysis of the published materials. If you read the find print and did the calculation, in principle the information was there. It’s not that the companies *lied* about their accuracy, at least not in terms of the numbers.

So what’s the issue driving the New York Times coverage? Primarily, it’s the companies overselling the accuracy of their tests, and the (completely understandable) patient reaction. In most cases, the literature from providers pays lip service to the idea that patients should undergo confirmative diagnostic testing before considering pregnancy termination or other measures. These confirmatory tests include either a CVS test or an amniocentesis; both are more invasive, so they aren’t likely to be the first step for many families, but they can provide certainty.

The companies say the need for this confirmation may be necessary. But in the same breath, the literature promotes the incredible accuracy of the tests.

The Times story uncovers cases of patients who underwent significant stress as a result of these false positive results, and even identifies cases in which patients terminated a pregnancy before confirmatory testing. That should not happen. Companies should not overstate the accuracy of their results. And doctors should be incredibly careful in how they present these tests and results to patients. Notably, I’d argue that it’s crucial to be clear with patients up-front about what they should expect with a positive result. Some of the biggest problems come when patients hear “positive” with little context; it can be difficult to grasp the nuances of false positives in a heightened emotional state.

From a patient standpoint, I think there is an important question about whether these tests are a good idea. On one hand, they do provide some information. On the other, the conditions they test for are very rare and in many cases somewhat poorly understood in terms of their impact. As a person who loves data, I err toward more information being better. But it’s only better if you understand and use it correctly.