Inferring time-dependent migration and coalescence patterns from genetic sequence and predictor data in structured populations

Population dynamics can be inferred from genetic sequence data using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume the parameters describing the dynamics to be constant through time in structured populations. Inference methods allowing for structured populations and parameters to vary through time involve many parameters which have to be inferred. Each of these parameters might be however only weakly informed by data. Here we introduce an approach that uses so-called predictors, such as geographic distance between locations, within a generalized linear model to inform the population dynamic parameters, namely the time-varying migration rates and effective population sizes under the marginal approximation of the structured coalescent. By using simulations, we show that we are able to reliably infer the parameters from phylogenetic trees. We then apply this framework to a previously described Ebola virus dataset. We infer incidence to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This allows us to show not only on simulated data, but also on real data, that we are able to identify reasonable predictors. Overall, we provide a novel method that allows to identify predictors for migration rates and effective population sizes and to use these predictors to quantify migration rates and effective population sizes. Its implementation as part of the BEAST2 software package MASCOT allows to jointly infer population dynamics within structured populations, the phylogenetic tree, and evolutionary parameters.

[1]  Trevor Bedford,et al.  Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2017, Nature.

[2]  Claudia Kohl,et al.  Temporal and spatial analysis of the 2014–2015 Ebola virus outbreak in West Africa , 2015, Nature.

[3]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[4]  A. Tatem,et al.  Dynamic population mapping using mobile phone data , 2014, Proceedings of the National Academy of Sciences.

[5]  Tanja Stadler,et al.  MASCOT: parameter and state inference under the marginal structured coalescent approximation , 2017, bioRxiv.

[6]  Sergei L. Kosakovsky Pond,et al.  Phylodynamics of Infectious Disease Epidemics , 2009, Genetics.

[7]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[8]  Erik M. Volz,et al.  Complex Population Dynamics and the Coalescent Under Neutrality , 2012, Genetics.

[9]  M. Suchard,et al.  Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H3N2 , 2014, PLoS pathogens.

[10]  A. Tatem,et al.  Quantifying seasonal population fluxes driving rubella transmission dynamics using mobile phone data , 2015, Proceedings of the National Academy of Sciences.

[11]  M. Notohara,et al.  The coalescent and the genealogical process in geographically structured population , 1990, Journal of mathematical biology.

[12]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[13]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[14]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[15]  J. Drake,et al.  Spatial spread of the West Africa Ebola epidemic , 2016, Royal Society Open Science.

[16]  R. Mikolajczyk,et al.  Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases , 2008, PLoS medicine.

[17]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[18]  Marc A Suchard,et al.  Simultaneously reconstructing viral cross-species transmission history and identifying the underlying constraints , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19]  Alexei J. Drummond,et al.  A Stochastic Simulator of Birth–Death Master Equations with Application to Phylodynamics , 2013, Molecular biology and evolution.

[20]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[21]  Tanja Stadler,et al.  Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[23]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[24]  M. Suchard,et al.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. , 2008, Molecular biology and evolution.

[25]  David Welch,et al.  Efficient Bayesian inference under the structured coalescent , 2014, Bioinform..

[26]  M. Suchard,et al.  Air Travel Is Associated with Intracontinental Spread of Dengue Virus Serotypes 1–3 in Brazil , 2014, PLoS neglected tropical diseases.

[27]  Marc A Suchard,et al.  Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates. , 2016, Systematic biology.

[28]  Tanja Stadler,et al.  The Structured Coalescent and Its Approximations , 2016, bioRxiv.

[29]  David A. Rasmussen,et al.  Phylodynamic Inference for Structured Epidemiological Models , 2014, PLoS Comput. Biol..

[30]  Tanja Stadler,et al.  Phylodynamics with Migration: A Computational Framework to Quantify Population Structure from Genomic Data , 2016, Molecular biology and evolution.

[31]  Nicola De Maio,et al.  New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation , 2015, PLoS genetics.

[32]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[33]  P. Lemey,et al.  Landscape attributes governing local transmission of an endemic zoonosis: Rabies virus in domestic dogs , 2018, Molecular Ecology.

[34]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[35]  N. Takahata,et al.  The coalescent in two partially isolated diffusion populations. , 1988, Genetical research.

[36]  David A. Rasmussen,et al.  Inference for Nonlinear Epidemiological Models Using Genealogies and Time Series , 2011, PLoS Comput. Biol..

[37]  Kyle B Gustafson,et al.  Identifying spatio-temporal dynamics of Ebola in Sierra Leone using virus genomes , 2017, Journal of The Royal Society Interface.

[38]  Pardis C Sabeti,et al.  Distinct lineages of Ebola virus in Guinea during the 2014 West African epidemic , 2015, Nature.

[39]  M. Slatkin Seeing ghosts: the effect of unsampled populations on migration rates estimated for sampled populations , 2004, Molecular ecology.

[40]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[41]  Guy Baele,et al.  Inferring Heterogeneous Evolutionary Processes Through Time: from Sequence Substitution to Phylogeography , 2013, Systematic biology.