Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals’ genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.

[1]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[2]  C. J-F,et al.  THE COALESCENT , 1980 .

[3]  Jon A Yamato,et al.  Maximum likelihood estimation of population growth rates based on the coalescent. , 1998, Genetics.

[4]  K. Crandall The evolution of HIV , 1999 .

[5]  O. Pybus,et al.  An integrated framework for the inference of viral population history from reconstructed genealogies. , 2000, Genetics.

[6]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[7]  B. Charlesworth Effective population size , 2002, Current Biology.

[8]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[9]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[10]  O. Pybus,et al.  Unifying the Epidemiological and Evolutionary Dynamics of Pathogens , 2004, Science.

[11]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[12]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[13]  M. Suchard,et al.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. , 2008, Molecular biology and evolution.

[14]  C. Viboud,et al.  Explorer The genomic and epidemiological dynamics of human influenza A virus , 2016 .

[15]  Edward C. Holmes,et al.  Discovering the Phylodynamics of RNA Viruses , 2009, PLoS Comput. Biol..

[16]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[17]  John Wakeley,et al.  Extensions of the Coalescent Effective Population Size , 2009, Genetics.

[18]  T. Stadler Sampling-through-time in birth-death trees. , 2010, Journal of theoretical biology.

[19]  B. Grenfell,et al.  Protocols for sampling viral sequences to study epidemic dynamics , 2010, Journal of The Royal Society Interface.

[20]  Erik M. Volz,et al.  Viral phylodynamics and the search for an ‘effective number of infections’ , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[21]  Wu-Chun Cao,et al.  Dual Seasonal Patterns for Influenza, China , 2010, Emerging infectious diseases.

[22]  P. Diggle,et al.  Geostatistical inference under preferential sampling , 2010 .

[23]  David A. Rasmussen,et al.  Inference for Nonlinear Epidemiological Models Using Genealogies and Time Series , 2011, PLoS Comput. Biol..

[24]  S. Ho,et al.  Skyline‐plot methods for estimating demographic history from nucleotide sequences , 2011, Molecular ecology resources.

[25]  Sarah Cobey,et al.  Predicting the Epidemic Sizes of Influenza A/H1N1, A/H3N2, and B: A Statistical Method , 2011, PLoS medicine.

[26]  Vladimir N. Minin,et al.  Integrated Nested Laplace Approximation for Bayesian Nonparametric Phylodynamics , 2012 .

[27]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[28]  Mandev S. Gill,et al.  Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. , 2013, Molecular biology and evolution.

[29]  Vladimir N Minin,et al.  Gaussian Process‐Based Bayesian Nonparametric Inference of Population Size Trajectories from Gene Genealogies , 2011, Biometrics.

[30]  Finn Lindgren,et al.  Bayesian computing with INLA: New features , 2012, Comput. Stat. Data Anal..

[31]  Trevor Bedford,et al.  Seasonality in the migration and establishment of H3N2 Influenza lineages with epidemic growth and decline , 2014, BMC Evolutionary Biology.

[32]  E. Volz,et al.  Sampling through time and phylodynamic inference with coalescent and birth–death models , 2014, Journal of The Royal Society Interface.