Bayesian adjustment for preferential testing in estimating infection fatality rates, as motivated by the COVID-19 pandemic

A key challenge in estimating the infection fatality rate (IFR), along with its relation with various factors of interest, is determining the total number of cases. The total number of cases is not known not only because not everyone is tested but also, more importantly, because tested individuals are not representative of the population at large. We refer to the phenomenon whereby infected individuals are more likely to be tested than noninfected individuals as "preferential testing." An open question is whether or not it is possible to reliably estimate the IFR without any specific knowledge about the degree to which the data are biased by preferential testing. In this paper we take a partial identifiability approach, formulating clearly where deliberate prior assumptions can be made and presenting a Bayesian model which pools information from different samples. When the model is fit to European data obtained from seroprevalence studies and national official COVID-19 statistics, we estimate the overall COVID-19 IFR for Europe to be 0.53%, 95% C.I. = [0.38%, 0.70%].

[1]  Paul Gustafson,et al.  Measurement error in meta‐analysis (MEMA)—A Bayesian framework for continuous outcome data subject to non‐differential measurement error , 2020, Research synthesis methods.

[2]  M. Hernán,et al.  Infection fatality risk for SARS-CoV-2 in community dwelling population of Spain: nationwide seroepidemiological study , 2020, BMJ.

[3]  D. Cummings,et al.  Age-specific mortality and immunity patterns of SARS-CoV-2 infection in 45 countries , 2020, medRxiv.

[4]  M. Hernán,et al.  Prevalence of SARS-CoV-2 in Spain (ENE-COVID): a nationwide, population-based seroepidemiological study , 2020, The Lancet.

[5]  N. Fenton,et al.  Bayesian network analysis of Covid-19 data reveals higher infection prevalence rates and lower fatality rates than widely reported , 2020, medRxiv.

[6]  A. Gelman,et al.  Bayesian Analysis of Tests with Unknown Specificity and Sensitivity , 2020, medRxiv.

[7]  J. Ioannidis,et al.  The infection fatality rate of COVID-19 inferred from seroprevalence data , 2020, medRxiv.

[8]  Richard E. Grewelle,et al.  Estimating the Global Infection Fatality Rate of COVID-19 , 2020, medRxiv.

[9]  M. Nöthen,et al.  Infection fatality rate of SARS-CoV-2 infection in a German community with a super-spreading event , 2020 .

[10]  X. Hua,et al.  The closer to the Europe Union headquarters, the higher risk of COVID-19? Cautions regarding ecological studies of COVID-19 , 2020, medRxiv.

[11]  L. Smeeth,et al.  COVID-19: a need for real-time monitoring of weekly excess deaths , 2020, The Lancet.

[12]  M. Paradisi,et al.  An empirical estimate of the infection fatality rate of COVID-19 from the first Italian outbreak , 2020, medRxiv.

[13]  J. Ioannidis,et al.  COVID-19 antibody seroprevalence in Santa Clara County, California , 2020, medRxiv.

[14]  Kari Stefansson,et al.  Spread of SARS-CoV-2 in the Icelandic Population , 2020, The New England journal of medicine.

[15]  Hannah Stower Spread of SARS-CoV-2 , 2020, Nature Medicine.

[16]  G. Onder,et al.  Case-Fatality Rate and Characteristics of Patients Dying in Relation to COVID-19 in Italy. , 2020, JAMA.

[17]  M. Lipsitch,et al.  Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China , 2020, Nature Medicine.

[18]  T. Hollingsworth,et al.  How will country-based mitigation measures influence the course of the COVID-19 epidemic? , 2020, The Lancet.

[19]  Anthony Hauser,et al.  Estimation of SARS-CoV-2 mortality during the early stages of an epidemic: a modelling study in Hubei, China and northern Italy , 2020, medRxiv.

[20]  Simon Cauchemez,et al.  Systematic selection between age and household structure for models aimed at emerging epidemic predictions , 2020, Nature Communications.

[21]  Hiroshi Nishiura,et al.  Communicating the Risk of Death from Novel Coronavirus Disease (COVID-19) , 2020, Journal of clinical medicine.

[22]  N. Linton,et al.  Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data , 2020, medRxiv.

[23]  T. Hale,et al.  Oxford COVID-19 Government Response Tracker , 2020 .

[24]  Michael J Crowther,et al.  Using simulation studies to evaluate statistical methods , 2017, Statistics in medicine.

[25]  K. Bollaerts,et al.  Bias due to differential and non-differential disease- and exposure misclassification in studies of vaccine effectiveness , 2018, PloS one.

[26]  Paul J. Birrell,et al.  Evidence Synthesis for Stochastic Epidemic Models. , 2017, Statistical science : a review journal of the Institute of Mathematical Statistics.

[27]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[28]  R. Bodík,et al.  Programming With Models: Writing Statistical Algorithms for General Model Structures With NIMBLE , 2015, 1505.05093.

[29]  Thomas House,et al.  Four key challenges in infectious disease modelling using data from multiple sources , 2015, Epidemics.

[30]  John K. Kruschke,et al.  Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan , 2014 .

[31]  J. Wong,et al.  Case Fatality Risk of Influenza A (H1N1pdm09): A Systematic Review , 2013, Epidemiology.

[32]  L. Pace Discrete Probability Distributions , 2012 .

[33]  Paul Gustafson,et al.  Bayesian Inference for Partially Identified Models , 2020, The international journal of biostatistics.

[34]  Paul A. Biedrzycki,et al.  The severity of pandemic H1N1 influenza in the United States, April – July 2009 , 2010, PLoS currents.

[35]  Sander Greenland,et al.  Interval Estimation for Messy Observational Data , 2009, 1010.0306.

[36]  Agner Fog,et al.  Sampling Methods for Wallenius' and Fisher's Noncentral Hypergeometric Distributions , 2008, Commun. Stat. Simul. Comput..

[37]  Nicola J Cooper,et al.  Evidence‐based sample size calculations based upon updated meta‐analysis , 2007, Statistics in medicine.

[38]  David R. Jones,et al.  How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS , 2005, Statistics in medicine.

[39]  Sander Greenland,et al.  Multiple‐bias modelling for analysis of observational data , 2005 .

[40]  C. Manski Partial Identification of Probability Distributions , 2003 .

[41]  S. Thompson,et al.  How should meta‐regression analyses be undertaken and interpreted? , 2002, Statistics in medicine.

[42]  N. Pearce,et al.  The ecological fallacy strikes back , 2000, Journal of epidemiology and community health.

[43]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[44]  Michael Thrusfield,et al.  Statistics in epidemiology: methods, techniques, and applications , 1997 .

[45]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[46]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[47]  N. I. Lyons M29. Closed expressions for noncentral hypergeometric probabilities , 1980 .

[48]  W. Stevens Mean and variance of an entry in a contingency table , 1951 .