Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models

Estimating temporal changes in a target population from phylogenetic or count data is an important problem in ecology and epidemiology. Reliable estimates can provide key insights into the climatic and biological drivers influencing the diversity or structure of that population and evidence hypotheses concerning its future growth or decline. In infectious disease applications, the individuals infected across an epidemic form the target population. The renewal model estimates the effective reproduction number, R, of the epidemic from counts of its observed cases. The skyline model infers the effective population size, N, underlying a phylogeny of sequences sampled from that epidemic. Practically, R measures ongoing epidemic growth while N informs on historical caseload. While both models solve distinct problems, the reliability of their estimates depends on p-dimensional piecewise-constant functions. If p is misspecified, the model might underfit significant changes or overfit noise and promote a spurious understanding of the epidemic, which might misguide intervention policies or misinform forecasts. Surprisingly, no transparent yet principled approach for optimising p exists. Usually, p is heuristically set, or obscurely controlled via complex algorithms. We present a computable and interpretable p-selection method based on the minimum description length (MDL) formalism of information theory. Unlike many standard model selection techniques, MDL accounts for the additional statistical complexity induced by how parameters interact. As a result, our method optimises p so that R and N estimates properly adapt to the available data. It also outperforms comparable Akaike and Bayesian information criteria on several classification problems. Our approach requires some knowledge of the parameter space and exposes the similarities between renewal and skyline models.

[1]  Shiwei Lan,et al.  phylodyn: an R package for phylodynamic simulation and inference , 2016, Molecular ecology resources.

[2]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[3]  Kris Parag,et al.  Exact Bayesian inference for phylogenetic birth‐death models , 2018, Bioinform..

[4]  C. Fraser,et al.  A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics , 2013, American journal of epidemiology.

[5]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[6]  Mark A. Pitt,et al.  Advances in Minimum Description Length: Theory and Applications , 2005 .

[7]  Simon Cauchemez,et al.  Measuring the path toward malaria elimination , 2014, Science.

[8]  K. Strimmer,et al.  Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo , 2005, BMC Evolutionary Biology.

[9]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[10]  Derek A T Cummings,et al.  Influenza transmission in households during the 1918 pandemic. , 2011, American journal of epidemiology.

[11]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[12]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[13]  M. Saunders,et al.  Plant-Provided Food for Carnivorous Insects: a Protective Mutualism and Its Applications , 2009 .

[14]  Mandev S. Gill,et al.  Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. , 2013, Molecular biology and evolution.

[15]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[16]  S. Ho,et al.  Skyline‐plot methods for estimating demographic history from nucleotide sequences , 2011, Molecular ecology resources.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Kris V Parag,et al.  Optimal Point Process Filtering and Estimation of the Coalescent Process , 2015, bioRxiv.

[19]  Kris V Parag,et al.  Robust Design for Coalescent Model Inference , 2018, bioRxiv.

[20]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[21]  M. Lipsitch,et al.  How generation intervals shape the relationship between growth rates and reproductive numbers , 2007, Proceedings of the Royal Society B: Biological Sciences.

[22]  M. Suchard,et al.  Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. , 2012, Molecular biology and evolution.

[23]  I. J. Myung,et al.  Toward a method of selecting among computational models of cognition. , 2002, Psychological review.

[24]  Jay I. Myung,et al.  Model selection by Normalized Maximum Likelihood , 2006 .

[25]  O. Pybus,et al.  The Epidemic Behavior of the Hepatitis C Virus , 2001, Science.

[26]  Louis du Plessis,et al.  Jointly Inferring the Dynamics of Population Size and Sampling Intensity from Molecular Sequences , 2019, bioRxiv.

[27]  P. Grünwald,et al.  Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma , 2012 .

[28]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  C. Viboud,et al.  Explorer The genomic and epidemiological dynamics of human influenza A virus , 2016 .

[30]  Chieh-Hsi Wu,et al.  Are Skyline Plot-Based Demographic Estimates Overly Dependent on Smoothing Prior Assumptions? , 2020, bioRxiv.

[31]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[32]  Donald L. Snyder,et al.  Random Point Processes in Time and Space , 1991 .

[33]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[34]  Hans R. Künsch,et al.  Some Notes on Rissanen's Stochastic Complexity , 1998, IEEE Trans. Inf. Theory.

[35]  Anne-Mieke Vandamme,et al.  Tracing the origin and history of the HIV-2 epidemic , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  O. Pybus,et al.  An integrated framework for the inference of viral population history from reconstructed genealogies. , 2000, Genetics.

[37]  Wes Hinsley,et al.  A simple approach to measure transmissibility and forecast incidence , 2017, Epidemics.

[38]  K. Strimmer,et al.  Exploring the demographic history of DNA sequences using the generalized skyline plot. , 2001, Molecular biology and evolution.

[39]  M. Suchard,et al.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. , 2008, Molecular biology and evolution.

[40]  A. Drummond,et al.  Bayesian inference of population size history from multiple loci , 2008, BMC Evolutionary Biology.

[41]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[42]  J. Wallinga,et al.  Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures , 2004, American journal of epidemiology.