An integrated framework for the joint inference of demographic history and sampling intensity from genealogies or genetic sequences

Estimating effective population size, given a coalescent genealogy reconstructed from sequences that are longitudinally sampled from that population, is an important problem in epidemiology and macroevolution. Here the population represents infected individuals across a viral epidemic or historical abundances of a species of interest. The coalescent and sample times delineate the branches and tips of the reconstructed genealogy. Popular skyline estimators use these coalescent times to infer population size, but presume that sample times are predetermined and uninformative. We question this assumption, and formulate a new skyline method, termed the epoch sampling skyline plot (ESP), to rigorously incorporate sample time information. Our method uses an epochal sampling model in which the longitudinal sampling rate has a piecewise-constant, proportional dependence on population size, with constants of proportionality known as sampling intensities. We prove that the ESP can at least double the best precision achievable by standard skylines, while still fitting practical and flexible sampling scenarios. These include widely used density and frequency dependent protocols, which feature fixed sampling intensities, or constant sample counts. We show that sampling intensities, and population sizes can be jointly estimated, and that our estimates are markedly improved in periods where standard skyline methods are biased by long coalescent branches. We benchmark the ESP against existing approaches using simulated and empirical datasets, and provide efficient Bayesian (BEAST2) and maximum-likelihood implementations. Ignoring the sampling process disregards a rich source of information that could become increasingly important as data collection improves and intensifies.

[1]  O. Pybus,et al.  An integrated framework for the inference of viral population history from reconstructed genealogies. , 2000, Genetics.

[2]  M. Suchard,et al.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. , 2008, Molecular biology and evolution.

[3]  Kris V Parag,et al.  Robust Design for Coalescent Model Inference , 2018, bioRxiv.

[4]  Guy Baele,et al.  Emerging Concepts of Data Integration in Pathogen Phylodynamics , 2016, Systematic biology.

[5]  Shiwei Lan,et al.  phylodyn: an R package for phylodynamic simulation and inference , 2016, Molecular ecology resources.

[6]  N. Ferguson,et al.  Ecological and immunological determinants of influenza evolution , 2003, Nature.

[7]  A. Durandy,et al.  Hepatitis C Virus (HCV) , 2003, Transfusion Medicine and Hemotherapy.

[8]  S. Ho,et al.  Skyline‐plot methods for estimating demographic history from nucleotide sequences , 2011, Molecular ecology resources.

[9]  Kris Parag,et al.  Exact Bayesian inference for phylogenetic birth‐death models , 2018, Bioinform..

[10]  Mattias Jakobsson,et al.  Inferring Past Effective Population Size from Distributions of Coalescent Times , 2016, Genetics.

[11]  Sarah Cobey,et al.  Predicting the Epidemic Sizes of Influenza A/H1N1, A/H3N2, and B: A Statistical Method , 2011, PLoS medicine.

[12]  Andrew Rambaut,et al.  The effects of sampling strategy on the quality of reconstruction of viral population dynamics using Bayesian skyline family coalescent methods: A simulation study , 2016, Virus evolution.

[13]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[14]  Donald L. Snyder,et al.  Random Point Processes in Time and Space , 1991 .

[15]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[16]  M. Hofreiter,et al.  A Paleogenomic Perspective on Evolution and Gene Function: New Insights from Ancient DNA , 2014, Science.

[17]  C. Viboud,et al.  Explorer The genomic and epidemiological dynamics of human influenza A virus , 2016 .

[18]  S. Bonhoeffer,et al.  Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) , 2012, Proceedings of the National Academy of Sciences.

[19]  J. Kingman On the genealogy of large populations , 1982 .

[20]  Gerardo Chowell,et al.  The RAPIDD ebola forecasting challenge: Synthesis and lessons learnt. , 2017, Epidemics.

[21]  Andrew Rambaut,et al.  Evolutionary analysis of the dynamics of viral infectious disease , 2009, Nature Reviews Genetics.

[22]  E. Volz,et al.  Sampling through time and phylodynamic inference with coalescent and birth–death models , 2014, Journal of The Royal Society Interface.

[23]  Sebastián Duchêne,et al.  BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis , 2019, PLoS computational biology.

[24]  K. Strimmer,et al.  Exploring the demographic history of DNA sequences using the generalized skyline plot. , 2001, Molecular biology and evolution.

[25]  Vladimir N. Minin,et al.  Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference , 2015, PLoS Comput. Biol..

[26]  V. Isham,et al.  Modeling infectious disease dynamics in the complex landscape of global health , 2015, Science.

[27]  Beth Shapiro,et al.  Rise and Fall of the Beringian Steppe Bison , 2004, Science.

[28]  Mandev S. Gill,et al.  Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. , 2013, Molecular biology and evolution.

[29]  B. Grenfell,et al.  Protocols for sampling viral sequences to study epidemic dynamics , 2010, Journal of The Royal Society Interface.

[30]  Kris V Parag,et al.  Optimal Point Process Filtering and Estimation of the Coalescent Process , 2015, bioRxiv.