Predictive Mathematical Models of the COVID-19 Pandemic: Underlying Principles and Value of Projections.

Numerous mathematical models are being produced to forecast the future of coronavirus disease 2019 (COVID19) epidemics in the US and worldwide. These predictions have far-reaching consequences regarding how quickly and how strongly governments move to curb an epidemic. However, the primary and most effective use of epidemiological models is to estimate the relative effect of various interventions in reducing disease burden rather than to produce precise quantitative predictions about extent or duration of disease burdens. For predictions, “models are not crystal balls,” as Ferguson noted in a recent overview of the role of modeling.1 Nevertheless, consumers of epidemiological models, including politicians, the public, and the media, often focus on the quantitative predictions of infections and mortality estimates. Such measures of potential disease burden are necessary for planners who consider future outcomes in light of health care capacity. How then should such estimates be assessed? Althoughrelativeeffectsoninfectionsassociatedwith various interventions are likely more reliable, accompanying estimates from models about COVID-19 can contribute to uncertainty and anxiety. For instance, will the US have tens of thousands or possibly even hundreds of thousands of deaths? The main focus should be on the kinds of interventions that could help reduce these numbers because the interventions undertaken will, of course, determine the eventual numerical reality. Model projections are needed to forecast future health care demand, including how many intensive care unit beds will be needed, where and when shortages of ventilators will most likely occur, and the number of health care workers required to respond effectively. Short-term projections can be crucial to assist planning, but it is usually unnecessary to focus on long-term “guesses” for such purposes. In addition, forecasts from computational models are being used to establish local, state, and national policy. When is the peak of cases expected? If social distancing is effective and the number of new cases that require hospitalization is stable or declining, when is it time to consider a return to work or school? Can large gatherings once again be safe? For these purposes, models likely only give insight into the scale of what is ahead and cannot predict the exact trajectory of the epidemic weeks or months in advance. According to Whitty, models should not be presented as scientific truth; they are most helpful when they present more than what is predictable by common sense.2 Estimates that emerge from modeling studies are only as good as the validity of the epidemiological or statistical model used; the extent and accuracy of the assumptions made; and, perhaps most importantly, the quality of the data to which models are calibrated. Early in an epidemic, the quality of data on infections, deaths, tests, and other factors often are limited by underdetection or inconsistent detection of cases, reporting delays, and poor documentation, all of which affect the quality of any model output. Simpler models may provide less valid forecasts because they cannot capture complex and unobserved human mixing patterns and other timevarying characteristics of infectious disease spread. On the other hand, as Kucharski noted, “complex models may be no more reliable than simple ones if they miss key aspects of the biology. Complex models can create the illusion of realism, and make it harder to spot crucial omissions.”3 A greater level of detail in a model may provide a more adequate description of an epidemic, but outputs are sensitive to changes in parametric assumptions and are particularly dependent on external preliminary estimates of disease and transmission characteristics, such as the length of the incubation and infectious periods. In predicting the future of the COVID-19 pandemic, many key assumptions have been based on limited data. Models may capture aspects of epidemics effectively while neglecting to account for other factors, such as the accuracy of diagnostic tests; whether immunity will wane quickly; if reinfection could occur; or population characteristics, such as age distribution, percentage of older adults with comorbidities, and risk factors (eg, smoking, exposure to air pollution). Some critical variables, including the reproductive number (the average number of new infections associated with 1 infected person) and social distancing effects, can also change over time. However, many reports of models do not clearly report key assumptions that have been included or the sensitivity to errors in these assumptions. Predictive models for large countries, such as the US, are even more problematic because they aggregate heterogeneous subepidemics in local areas. Individual characteristics, such as age and comorbidities, influence risk of serious disease from COVID-19, but population distributions of these factors vary widely in the US. For example, the population of Colorado is characterized by a lower percentage of comorbidities than many southern states. The population in Florida is older than the populationinUtah.Evenwithinastate,keyvariablescanvarysubstantially, such as the prevalence of important prognostic factors (eg, cardiovascular or pulmonary disease) or environmental factors (eg, population density, outdoor air pollution). Social distancing is more difficult to achieve in urban than in suburban or rural areas. In addition, variation in the accuracy of disease incidence and prevalence estimatesmayoccurbecauseofdifferencesintestingbetween areas.Consequently,projectionsfromvariousmodelshave resultedinawiderangeofpossibleoutcomes.Forinstance, an early estimate suggested that COVID-19 could account for 480 000 deaths in the US,4 whereas later models VIEWPOINT