As many of you know, another of my projects during this crazy time is the website COVID-Explained. We’ve been working to keep it up to date (new results are coming in all the time!) and hope you’ll keep checking it out.

And now: the site has its own newsletter, which you can sign up for here. We’ll post twice a week, flagging our newer updates, and showing some neat visuals for our “Explainers”. I’m hoping many of you will sign up, and when you see something you like, forward it on.

Like it or not, the world is reopening, and we are hoping the information here can help us do it a little more safely.

And now onto today’s topic: data.

**Case Fatality Rates**

After last week’s post on grandparents and day care, a number of people asked me: What is the piece of data you think we are most missing about the epidemic. As in, what would be the most valuable for decision-making? It wasn’t hard to answer: What I’d most want to know is the serious illness and fatality rates from the virus, by age, and based on other risks.

That is, I want some kind of calculator where I could enter age, gender, other risk factors (i.e. do you have diabetes, or hypertension or other medical risks). And the calculator would give me an estimate of hospitalization or mortality risk from COVID-19 infection.

This is perhaps morbid, but it would also be really, really valuable. Why?

Think about the case where your mom is a healthy 79 year old; no diabetes or hypertension, no other risk factors. You’re trying to decide whether to see her, or not, and you’re trying to use the methods I outlined – consider the risk of infection, multiplied by the chance that something really terrible happens.

For this decision, it matters an awful lot whether the mortality risk from COVID-19 is 0.1% or 1% or 10%. These may all seem “small”, but 10% is literally 100 times as big as 0.1%. The risk of mortality from seasonal flu is above 0.1% but below 1%; seeing COVID-19 numbers in black and white would let us compare them more directly to risks we are taking more routinely.

Unfortunately, actually getting the numbers we would want here is very hard. In my post on grandparents, I showed some figures on this, and mentioned they were not easy to interpret. Here, I thought I might go into a little more detail about why this problem is hard, and what we do know at this point (it’s not much). I’m going to focus here on mortality risk.

*Why is this Hard? *

Let’s back up and think about what we would like to know. We are looking for the ‘case fatality rate’. That is: the chance of death given infection.

Conceptually, this is simple: it’s the number of deaths divided by the total number of infections. The problem is that it is really, really hard to know the total number of infections. (It’s also not that easy to know the total deaths, since people may die from COVID-related complications without us knowing they have COVID, but on net this is the better measured quantity).

We do not know the total number of infections because of a combination of limited testing availability, people not pursuing testing, and mild and asymptomatic infections. It’s not that we *couldn’t* know this number, but we’d need much better testing, testing which targeted a random sample of the population or tested everyone in some group.

In the absence of this, the observed number of infections is lower than the actual number. Since the case fatality rate is deaths divided by infections *that we see*, it will be too big.

It’s not impossible to adjust for this, if you know what share of infections are detected. But, unfortunately, this is also unknown. To figure it out, we’d need the same kind of random or universal testing data.

Again, in the absence of this the best we can do is *make some assumptions. *Which is mostly what researchers do. In the last post I cited this article from the Lancet which makes an effort to adjust case fatality rates for both under-testing and for the share of cases which are asymptomatic. They come up with (for example) a fatality rate of 8 in 100 for those over 80.

These estimates assume a 30% asymptomatic rate. That’s consistent with some of the data we see. But it’s not reflective of all of the data. I’ve seen estimates as low as 5% asymptomatic. A comprehensive analysis in a nursing home in Washington State showed 60% of cases were asymptomatic.

These numbers matter *a lot*. If the asymptomatic rate was 60%, rather than 30%, the actual fatality rate for those over 80 would be much lower, more like 4 in 100 than 8 in 100. If the share of infections which are asymptomatic was only 5%, the case fatality rate would be more than 10 in 100.

And we can layer all kinds of other complaints on top of this. The best data we have comes from the very start of the epidemic (in China). As things have evolved we have learned more about treatment (meaning the fatality rates may have gone down). On the other hand, the population in the US differs in other ways from the Chinese population (this could mean fatality rates are higher or lower).

*What about Other Risk Factors?*

All the challenges I highlight above are problems with just learning the basic fatality rates by age. But that’s not everything we’d want to know. We have a sense that COVID-19 is more risky for people with other medical conditions. We’d ideally like a *complete sense *of how all of these factors work together. I want an answer to: what is the precise fatality risk for someone who is 81 years old with high blood pressure but no other significant medical conditions?

It is hopeless to think about getting anything close this with our current data. The best we can do at the moment is to analyze on data on the risk factors for mortality conditional on hospitalization. For example, there is this meta-analysis which illustrates that patients with high blood pressure, other heart problems or diabetes are more likely to die in the hospital.

Again, this gives us a sense of the *direction * of the risks, but not their magnitude. More medical risk factors clearly seem to equal higher risk, but without knowing the hospitalization rates across these different groups, or the baseline risk of infection (as I complained about above) we have no way to translate this into magnitudes.

In thinking this through it strikes me that one might be able to combine some of these data together to do better. For example, if I had a sense of the average risk for some age group, I could combine that with the data on relative risks for people with other medical conditions, and also with the prevalence of those conditions in the population. Together, these would imply some absolute risk estimates by group.

I would encourage someone to try this and report back. It’s a bit in the weeds for a newsletter, even this one.

The bottom line is that the best you can do — at the moment — is take an estimate of the baseline risk for a particular age group, and adjust it up or down in some arbitrary way based on risk.

Making this even more complex, there is a huge variation even within risk groups in the consequences of the virus. Within nursing homes, there are a huge number of deaths and also a large share of people who have totally asymptomatic infection. This has led to speculation that there is some genetic variation which drives severity; but we do not yet know if that is right or how we’d figure out who is more at risk.

*CAN YOU TELL ME ANYTHING?*

Well, yes. Here is what I would say:

- The evidence points to the conclusion that for some age groups (children, teens, people in their 20s) the COVID-19 case fatality rate is low; probably significantly lower than the seasonal flu. (The infection rate may be higher; I’m talking here just about the mortality risk conditional on infection).
- It also seems clear that the risk to the elderly is quite a bit higher than the seasonal flu.
- The risks are much higher for those with other risk factors.

Honestly, that’s about it. It’s very frustrating. I feel the data could serve us better. It’s not that no one is working on this! Johns Hopkins has a comprehensive data site about mortality risks. But (as they note) they are basing their data on reported cases and deaths. We know that’s misleading!

Yes, the problem is hard, but we could do so much better. So view this newsletter partly as a plea. I know some of you are data people! Can’t we get something better here?