Tracking disease outbreaks from sparse data with Bayesian inference

The COVID-19 pandemic provides new motivation for a classic problem in epidemiology: estimating the empirical rate of transmission during an outbreak (formally, the time-varying reproduction number) from case counts. While standard methods exist, they work best at coarse-grained national or state scales with abundant data, and struggle to accommodate the partial observability and sparse data common at finer scales (e.g., individual schools or towns). For example, case counts may be sparse when only a small fraction of infections are caught by a testing program. Or, whether an infected individual tests positive may depend on the kind of test and the point in time when they are tested. We propose a Bayesian framework which accommodates partial observability in a principled manner. Our model places a Gaussian process prior over the unknown reproduction number at each time step and models observations sampled from the distribution of a specific testing program. For example, our framework can accommodate a variety of kinds of tests (viral RNA, antibody, antigen, etc.) and sampling schemes (e.g., longitudinal or cross-sectional screening). Inference in this framework is complicated by the presence of tens or hundreds of thousands of discrete latent variables. To address this challenge, we propose an efficient stochastic variational inference method which relies on a novel gradient estimator for the variational objective. Experimental results for an example motivated by COVID-19 show that our method produces an accurate and well-calibrated posterior, while standard methods for estimating the reproduction number can fail badly.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Yao Zhang,et al.  Controlling Propagation at Group Scale on Networks , 2015, 2015 IEEE International Conference on Data Mining.

[3]  Michalis K. Titsias,et al.  Bayesian Time Series Models: Markov chain Monte Carlo algorithms for Gaussian processes , 2011 .

[4]  Yao Zhang,et al.  DAVA: Distributing Vaccines over Networks under Prior Information , 2014, SDM.

[5]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[6]  T. Stadler,et al.  Practical considerations for measuring the effective reproductive number, Rt , 2020, medRxiv.

[7]  Shahin Jabbari,et al.  Modeling between-population variation in COVID-19 dynamics in Hubei, Lombardy, and New York City , 2020, Proceedings of the National Academy of Sciences.

[8]  K. Mandl,et al.  Early in the epidemic: impact of preprints on global discourse about COVID-19 transmissibility , 2020, The Lancet Global Health.

[9]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[10]  Arash Vahdat,et al.  DVAE++: Discrete Variational Autoencoders with Overlapping Transformations , 2018, ICML.

[11]  Joseph Dureau,et al.  Capturing the time-varying drivers of an epidemic using stochastic dynamical systems. , 2012, Biostatistics.

[12]  C. Althaus,et al.  Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020 , 2020, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[13]  Madhav V. Marathe,et al.  Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions , 2014, SDM.

[14]  Sam Abbott,et al.  Practical considerations for measuring the effective reproductive number, Rt , 2020, PLoS computational biology.

[15]  Naren Ramakrishnan,et al.  SourceSeer: Forecasting Rare Disease Outbreaks Using Multiple Data Sources , 2015, SDM.

[16]  George Turabelidze,et al.  Seroprevalence of Antibodies to SARS-CoV-2 in 10 Sites in the United States, March 23-May 12, 2020. , 2020, JAMA internal medicine.

[17]  C. Fraser,et al.  A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics , 2013, American journal of epidemiology.

[18]  Thibaut Jombart,et al.  Bayesian inference of transmission chains using timing of symptoms, pathogen genomes and contact data , 2019, PLoS Comput. Biol..

[19]  M. Pascual,et al.  Inapparent infections and cholera dynamics , 2008, Nature.

[20]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[21]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[22]  Sam Abbott,et al.  Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts , 2020, Wellcome Open Research.

[23]  L. Kucirka,et al.  Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction–Based SARS-CoV-2 Tests by Time Since Exposure , 2020, Annals of Internal Medicine.

[24]  R N Thompson,et al.  Improved inference of time-varying reproduction numbers during infectious disease outbreaks , 2019, Epidemics.

[25]  S. Bhatt,et al.  Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe , 2020, Nature.

[26]  Sudip Saha,et al.  Approximation Algorithms for Reducing the Spectral Radius to Control Epidemic Spread , 2015, SDM.

[27]  Joseph Dureau,et al.  Accounting for non-stationarity in epidemiology by embedding time-varying parameters in stochastic models , 2018, PLoS Comput. Biol..

[28]  Madhav V. Marathe,et al.  An interaction-based approach to computational epidemiology , 2008, AAAI 2008.

[29]  Milind Tambe,et al.  Test sensitivity is secondary to frequency and turnaround time for COVID-19 surveillance , 2020, medRxiv : the preprint server for health sciences.

[30]  J. Wallinga,et al.  Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures , 2004, American journal of epidemiology.

[31]  Galit Alter,et al.  Dynamics and significance of the antibody response to SARS-CoV-2 infection , 2020, medRxiv.