Welcome to pandemic hysteria. Have you read any articles predicting the end of civilization yet? I have. Have you seen claims that every step taken by the government is a mistake or we have acted too much or too little or too early or too late? I am confident that you have.
Why are there so many opinions? Why don’t we know what to do? Why can’t we just do what science says and everything will be fine? Why don’t I know how many people are going to die or what amount of social isolation is appropriate? Because I cannot know and neither can anyone else.
Let me let you in on a little secret: We suck at predicting. Science sucks at predicting. Virtually every prediction you hear is bogus, whether it be about a coronavirus or the weather or the stock market. The problem, though, is that people want certainty; they want to know what will happen, and the world is full of self-absorbed know-it-alls who will offer their opinions about what will happen. About any topic.
I remember in 2007 when Apple previewed the first iPhone. I watched and was excited to get one. Yes, I stood in a line when they came out. But I remember very clearly reading all of the pundits and predictors who were positive that the iPhone would be a complete and unmitigated disaster. I remember trying to reconcile all of the enthusiasm I was seeing personally for the product with the overwhelming certainty among the “experts” (that is, people who write dribble for a living and talk on TV shows) that the phone would go down in history as one of the dumbest and most unsuccessful products ever. Here is a sample if you want to read some of the claims. They are hilarious. Of course, in hindsight, the iPhone is perhaps the most successful product in human history. How could the people in the know be so wrong? And so confidently wrong?
I have written before about how bad pollsters are at predicting elections. If you were alive in 2016, you already know this. Lots of evidence exists which shows that predictions by financial experts about how the stock market will do in the following year are less accurate than had you used a coin toss to predict how to invest. By ten days, weather forecasts are no more than 50% accurate. Albert Einstein predicted in 1932 that nuclear energy production would never be possible. Decca record predicted that the Beetles would flop. Western Union predicted that the telephone would not be successful. IBM predicted that there wasn’t a need for more than about five computers in the world. When that turned out to be false, they then predicted (along with DEC) that the personal computer was foolish because no one would want a computer in their home. The lightbulb was a fad, the car would never replace the horse, television will never take off, smoking doesn’t cause cancer (said the National Cancer Institute in 1954), a rocket will never be able to leave the earth’s atmosphere, etc. We could do this for hours.
In fact, what is difficult to find is an accurate prediction about the future. Sensationalistic, doomsday predictions sell papers and get clicks and glue eyes to meaningless talk news programs and their sponsors. Here are some past predictions:
- 1967: There will be a dire famine by 1975.
- 1970: There will be an ice age by 2000.
- 1971: There will be a new ice age by 2020. (Washington Post)
- 1974: The Earth is unavoidably headed for another ice age. (Time)
- 1976: Scientific consensus is that the Earth is cooling and famines are coming soon (NY Times)
- 1978: No end in sight for global cooling (NY Times)
- 1988: The Maldives will be underwater by 2018 (United Nations)
- 1989: Rising waters will obliterate nations by 2000 if nothing done (United Nations)
Okay, you get the point. If you want dozens more of these, I stole them from here. I don’t even like the source of the articles and I’m not endorsing the author or the organization or his conclusions, but they are good examples. In addition to global cooling and rising waters, there were also peak oil predictions, overpopulation predictions, killer bee predictions, the end of snow, underwater Manhattan predictions, ineffective antibiotic predictions, no more minerals predictions, fat-phobia predictions, etc. There is always something that is going to destroy life as we know and it will happen with scientific certainty. These predictions, by the way, were not from crackpots: they were from leading scientists, the World Health Organization, the United Nations, governmental agencies, etc. Serious scientists do serious science.
If you want to do something interesting, try reading the daily paper from a month or a week ago today. Do it for a few days and you will be shocked at how wrong people are in predicting what a news story means or its importance. It’s just silly speculation filtered through a person’s personal biases. It is not science. You could use the Internet Archive to do this. For example, here is the Washington Post 10 years ago. In particular focus on the opinion or interpretation of news or prediction aspects of the writing. Play with your favorite website or newspaper or magazine. It is comical.
But why are people so often so bad at prediction? Let’s talk about some reasons and maybe even the novel coronavirus.
First and foremost, we are bad a predicting because we don’t understand the relevant data. Take weather forecasting, for example. We know what influences the weather but it is regulated by chaos theory. Small, practically immeasurable differences in variables can have a huge difference in the outcome of the system. Some have called this the Butterfly Effect, where the flapping of the wings of a butterfly in Brazil causes a tornado in Kansas. We understand the science of how that could happen, but we just can’t know enough of the variables in enough detail to make very accurate predictions of the weather even a week from now. The truth is, we aren’t even measuring enough variables and we don’t know all the ones we should be measuring.
Second, just because we know the data and can measure them doesn’t mean we know all of the important factors or variables that explain the outcome. In other words, we don’t have a complete model. What causes people to get heart disease? We know some of the causes and some very important ones at that, but we likely have an understanding of heart disease that informs only a small percentage of an accurate model. There is a lot we simply do not know.
Third, we tend to see the world as we want to see it and then massage the data to support our predetermined beliefs (think confirmation bias, motivated reasoning, optimism fallacy, naive realism). Do you want the stock market to go up? You’ll look for signs that it will do so. Are you a fatalist and are looking for the world to end? You’ll see all sorts of reasons why it will. Think your boyfriend is cheating on you? It won’t take you long to be convinced by the “evidence.”
Fourth, we tend to believe that the world is less changing than it actually is. Remember all of those crazy predictions? That the iPhone would flop or the television or the car or the lightbulb? Those predictions chiefly stem from a belief that the world, as we know it now, will forever be; why would it be different or why would we want it to be different? This is another form of the optimism bias and it is pervasive. So when we see evidence that the world is changing in any way, cooling or warming, less rain or more rain, more city folk and less country folk, more mumble rap and less Beethoven (ok maybe that one is true), we tend to predict disaster as a consequence, and we are almost always wrong. The world changes in ways that we cannot understand or predict; we want things to continue in a straight line; we like predictability and certainty.
Fifth, when there is change, we assume that it will progress in a linear (or at least) predictable manner. If stocks have gone up 7% a year for 10 years, then they will continue to do that for the next 30 years (says every quacky “financial planner” out there). If Dollar General stores continue at their current pace of growth, the whole world will be covered with Dollar General stores in 15 years. If coronavirus infections continue at their current pace, 100% of people on the planet will be infected in two months. All these sorts of predictions are predicated on a complete lack of understanding of math and complex systems.
Lastly, many people have an illusion of control, wherein they believe they have more control over outcomes than they actually do and this belief tricks them into having more confidence in what they know than they should. How much control do we actually have over climate change or the spread of the coronavirus? We have some control to be sure but not nearly as much as many people would assume.
Let me talk for just a minute about complex systems versus complicated systems because they are often confused by people who pretend to do science for a living. A complicated system is something like rocket science: we know all of the relevant factors that need to be accounted for to put a rocket into space. There are a lot of them, but we know everything we need to know to figure out how to launch a rocket and put it anywhere we need to in the galaxy. If we make mistakes, it is because we failed to account for something we should have (like wind resistance or humidity in the upper atmosphere) or it is because we made a rounding or measurement error. It is complicated, but we know what we need to know to do it. We also completely understand and can model on a computer how changing any single variable will affect the whole system: add some more weight and we can predict how that will change the eventual orbit of the rocket, for example.
On the other hand, complex systems are things like the weather or biology or the stock market. A complex system is defined by the idea that you cannot predict the outcome of the system by changing any given component (or even groups of components). Imagine if you added 200 lbs to the rocket but you couldn’t predict how that would affect the rocket in a consistent or predictable way: that’s a complex system. The stock market may go down one day because a high volume trader at a big New York firm broke up with his mistress and was in a bad mood and made a stupid sell that caused a panic on Wall Street. No one can predict that. It’s the butterfly wing flap in Brazil that causes a Tornado in Oklahoma.
When I give a medicine to a patient, it doesn’t always have the desired effect. Sometimes, it produces an effect opposite to what I expect. Even when it does what I want, it doesn’t do so by the same amount in every patient. When oil prices bottom out, sometimes the stock market goes up and sometimes it goes down; it never does anything by a predictable amount. I can look at trends from the past (how have other patients done with the same medicine or what has the market done in the past when oil as gone down) and I can make some generalizations, but not with any great degree of specificity or confidence. Still, if I looked at past experiences with the same medicine in patients like the one I’m treating now, I would be at least applying science to the question at hand, albeit not with great precision or confidence. But worse is when I try to predict what will happen with a new medicine or a new variable for which I do not have past experiences. That is nearly impossible.
Think about all the things that affect the spread of Sars-Cov-2. (As an aside, coronaviruses are a family of viruses that humans and animals can get. Sars-Cov-2 is the name of the virus and COVID-19 is the name of the disease that it can cause, analogous to HIV is the virus that causes AIDS). Let’s imagine some of the variables:
- The intrinsic properties of the virus that affect its transmission, such as its mode of transmission (fecal, oral, respiratory, etc.), the R0 (pronounced R naught) or how contagious it is, it’s lifespan, how long it goes unnoticed by the patient before becoming symptomatic, how long a patient remains contagious, etc.
- Environmental factors like temperature, humidity, precipitation, airflow and ventilation, as well properties of surfaces and other media that the virus might come into contact with.
- Host factors including the health status of the host (people who cough more due to smoking, for example, will spread more disease), socialness of the host (hermits versus socialites), population density (New York versus Montana), personal habits (how much do you touch your face), hygiene habits (hand-washing, cleaning), etc.
Now think about some of the things that might affect the severity of a COVID-19 infection (besides merely your risk of acquiring the virus):
- Age, health status, smoking status, immune status, gender, nutritional status, genetic predispositions, blood type (maybe), etc.
- Strand of virus (there are several mutations of the novel coronavirus apparently and we are not yet sure how these affect the severity of COVID-19 infection)
- Concomitant infections (yes you can get other respiratory infections at the same time)
- Availability and quality of medical interventions for the infected individual (yup, we have better health outcomes in the US than do countries like Italy and China and we have fewer smokers and I will be happy to publicly debate anyone who disagrees with that statement)
- Treatments available at the time of infection
There are lots of factors that I have not listed. Already we have a dynamic and complex model to predict the outcome of even one patient, let alone that one patient placed in a complex environment with thousands of other people who also each have their own complex stories. Very quickly, you can see that modeling this becomes about as accurate as modeling the weather. Then there are dynamic factors that evolve after the modeling takes place, such as:
- Changes in behaviors after fear of the disease sets in (increased hygiene, mask-wearing, less face touching, staying away from people, and other things that will vary widely and unpredictably from individual to individual)
- New mutations in the virus
- Changes in the availability of needed supplies (masks, medicine, ventilators, etc.)
- Discovery of new, more effective treatments
- Changes in R0 as herd immunity develops
- Changes in R0 as infection levels are higher (when fewer people to infect because they already have the virus in some local cluster)
- Other unknown events that might happen (natural disasters, wars, the plague, etc.)
All of those things are complete unknowns. But even if we had some idea about all those variables, we still aren’t focusing on the right outcomes. When we focus solely on death from COVID-19 infection as the outcome of interest in a model, we tend to ignore other related issues and are blind to unintended consequences. Instead, we should focus on total mortality and morbidity, not just mortality and morbidity related to COVID-19. When we do this, then we must also consider other things, such as:
- The potential negative impacts of economic recession which might increase risks of malnourishment or homelessness and other negative determinants of health
- The impact of delaying surgeries and cancer screenings and routine medical care
- The potential negative impact of social isolation and economic deprivation on mood and substance abuse (causing, perhaps, an increase in suicide, overdose, and violent crime)
- The negative impact of patients abusing touted treatments like hydroxychloroquine (we have already seen deaths due to overdose)
- The long-term social and health consequences of bankruptcies, foreclosures, job loss, etc.
Again, these are just some unintended consequences that come to mind. The point: we just don’t know and we never build models that are comprehensive in this manner and therefore we are usually wrong.
How do we model a pandemic? If you want a good primer about the math (and you like calculus and eigenvalues and stuff), you can start with this article. By the way, if you aren’t comfortable with calculus and eigenvalues, please don’t offer up any opinions about predicting what will happen with COVID-19. I’ve seen countless folks who own a spreadsheet run a simple equation relating to exponential growth and then tell their friends on social media that millions of people are going to die. I’ve also seen this from journalists (aka not epidemiologists and not mathematicians) and doctors (aka not epidemiologists and not mathematicians) and data scientists (aka not epidemiologists and not biologists) and everyone else who owns Microsoft Excel or maybe can write a Python script. It’s quite comical.
Ultimately, the accuracy of the model that we use is dependent upon:
- The complexity of the model – how many of the factors that we just included are accounted for in the model?
- The accuracy of the measurements of the variables in the model – what is the R0 for the Sars-Cov-2 virus?
- The margin of error of the variables in the model – what is the confidence interval for the R0 measurement?
- The time interval of the prediction – more unknown variables and more unknown events affect the model more significantly the longer it is projected forward (which we might call error magnification and propagation).
Most models that are being used are quite simple and employ just a few variables that are used in the exponential growth formula, namely the initial number of infections, the growth rate, and the time. This incredibly sophomoric approach to prediction is quite common on social media these days.
From this starting point, predictionologists will make some assumptions about the impact of COVID19 by calculating the presumed morbidity and mortality. For this, they must make an assumption about what fraction of people who acquire the virus will become ill, what fraction will become severely ill, what fraction will require ICU care, and what fraction will die. Those who know a little more about spreadsheets will also stratify those variables by things like gender, age, smoking status, etc. and attempt to match the model to the population of people they are interested in.
Finally, the most sophisticated models I have seen attempt to run the model with different assumptions in the growth rate based upon different levels of isolation and distancing measures employed.
So what’s wrong with any of that? Why isn’t it accurate?
First, the variables plugged in at each step are largely unknown. We still do not have, for example, an accurate assumption about the R0. Preliminary estimates place the number between 2 and 3. That’s a big difference and even that range is likely inaccurate. Starting with 1 person and an R0=2, after 200 periods there will be 52 infected people. Change R0 to 3 and over the same time, there will be 369. Change that to 300 periods and R0=2 results in 380 cases while R0=3 would result in 7098 cases. The equation is very sensitive to differences in R (and that’s just one of many variables that need to be accounted for to have an accurate model). If you want to explore what variables you need to build a better and more complete model that accounts for how diseases like this actually behave, read the eigenvalue article I linked to above as a starting point.
What’s more, R0=2-3 isn’t even based on good data and it is likely that it will change dynamically as more people become infected. The latter point is why we need to use equations much more sophisticated than the naive exponential growth equation but the former gets to the heart of a bigger problem: the data we have so far is limited and poor.
Our assumptions about growth rate are based upon knowing how many people have the infection, as are all of our assumptions about morbidity and mortality rates. But since testing is not random and nowhere near widespread, we have an ascertainment bias that overshadows and overestimates all of the variables in even the most complex models. In other words, we don’t know what percentage of the population is infected, what percentage of infected people get sick, severely sick, die, etc. We haven’t tested enough people and we haven’t tested randomly to know these answers with enough certainty to model anything, let alone make such specific predictions. Most of our assumptions right now are based upon data from China and if you think they are being forthright then you don’t understand geopolitics at all.
Does that mean that predictive models are useless? They aren’t entirely useless but one must assume that we are modeling absolute worst-case scenarios. Since our numbers are derived from the sickest and most vulnerable patients (the ones who we were forced to test with our limited testing capacity), then any model based on those numbers represents an extreme and worst-case prediction. It’s not wrong to prepare for the worst-case scenario.
That brings up another point about worst-case scenario preparation. When we focus only on the worst-case scenario, we tend to skew reality quite a bit. I ride a motorcycle. What’s the worst-case scenario for me riding a motorcycle? Maybe that while riding it I crash into a van carrying all of my family and best friends and cause an explosion that kills us all. Would I ever ride a motorcycle if I thought that was likely? Of course not. Or how about that I just get killed riding? Still not a pleasant thought but not all that likely. That latter scenario, though it is more likely that the other one, still doesn’t stop me from riding because it’s pretty unlikely and I have decided that the benefits of riding outweigh that risk.
Now just because it is unlikely doesn’t mean I don’t do some things to mitigate that risk, like wear a helmet or practice other safe riding skills. Those things are akin to social distancing and some of the other extraordinary measures we are taking now. They may not make much of a difference in the final analysis, but how can we be sure right now? What if it is the thing that saves a couple of million people from dying? So we do those things not because they are the end-all-be-all, must-do-this-or-we-are-all-certain-to-die, but we do them because we are leveraging against the worst-case scenario (which is very unlikely to happen).
Sure, there are some people who don’t think this way and believe that what we are doing is stupid, just like there are some people who ride a motorcycle without a helmet after 5 beers. Who’s the stupid one?
Still, models should also relate the best-case scenario to provide some balance in our emotional responses. Models should do this anyway, but people who build models usually have an agenda and the model will support that agenda. For example, not all models supported that Clinton would win in 2016; some indicated that Trump would win. Did bias influence those models? Of course it did. The predicter makes assumptions and those assumptions comport with their biases.
The truth is, a model should account for the confidence of measurements along each step of the calculation. If we believe that R0 is between 2-3, run the model with both extremes. If we believe that mortality is between 0.125% and 8%, run the calculations with both extremes. Again, there are many, many variables besides those in a complex model and all of them have a confidence interval. Error becomes propagated throughout the model unless these confidence intervals are thoughtfully accounted for at each step.
At the end of such modeling, you might find a prediction like this: between 20,000 and 3 million Americans will die of COVID-19 in the next 6 months. That’s the truth. Which is more likely? Something in the middle but where the true answer will be is unknowable right now.
That’s why predictions are almost always wrong and that wide variation in what the model predicts is rarely reported because it just doesn’t sound as good or confident coming from some know-it-all on the internet or some self-absorbed “expert” on TV.