The Structured Coalescent and Its Approximations

Abstract Phylogeographic methods can help reveal the movement of genes between populations of organisms. This has been widely done to quantify pathogen movement between different host populations, the migration history of humans, and the geographic spread of languages or gene flow between species using the location or state of samples alongside sequence data. Phylogenies therefore offer insights into migration processes not available from classic epidemiological or occurrence data alone. Phylogeographic methods have however several known shortcomings. In particular, one of the most widely used methods treats migration the same as mutation, and therefore does not incorporate information about population demography. This may lead to severe biases in estimated migration rates for data sets where sampling is biased across populations. The structured coalescent on the other hand allows us to coherently model the migration and coalescent process, but current implementations struggle with complex data sets due to the need to infer ancestral migration histories. Thus, approximations to the structured coalescent, which integrate over all ancestral migration histories, have been developed. However, the validity and robustness of these approximations remain unclear. We present an exact numerical solution to the structured coalescent that does not require the inference of migration histories. Although this solution is computationally unfeasible for large data sets, it clarifies the assumptions of previously developed approximate methods and allows us to provide an improved approximation to the structured coalescent. We have implemented these methods in BEAST2, and we show how these methods compare under different scenarios.

[1]  Liam J. Revell,et al.  phytools: an R package for phylogenetic comparative biology (and other things) , 2012 .

[2]  M. Notohara,et al.  The coalescent and the genealogical process in geographically structured population , 1990, Journal of mathematical biology.

[3]  David Welch,et al.  Efficient Bayesian inference under the structured coalescent , 2014, Bioinform..

[4]  J. Hey Isolation with migration models for more than two populations. , 2010, Molecular biology and evolution.

[5]  N. Takahata,et al.  The coalescent in two partially isolated diffusion populations. , 1988, Genetical research.

[6]  M. Pascual,et al.  Global Migration Dynamics Underlie Evolution and Persistence of Human Influenza A (H3N2) , 2010, PLoS pathogens.

[7]  M. Suchard,et al.  The early spread and epidemic ignition of HIV-1 in human populations , 2014, Science.

[8]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[9]  A. Siepel,et al.  Bayesian inference of ancient human demography from individual genome sequences , 2011, Nature Genetics.

[10]  Eleca J. Dunham,et al.  The origin and phylogeography of dog rabies virus. , 2008, The Journal of general virology.

[11]  Anders E. Halager,et al.  A New Isolation with Migration Model along Complete Genomes Infers Very Different Divergence Processes among Closely Related Great Ape Species , 2012, PLoS genetics.

[12]  Nicola De Maio,et al.  New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation , 2015, PLoS genetics.

[13]  Sergei L. Kosakovsky Pond,et al.  Phylodynamics of Infectious Disease Epidemics , 2009, Genetics.

[14]  Trevor Bedford,et al.  Global circulation patterns of seasonal influenza viruses vary with antigenic drift , 2015, Nature.

[15]  M. Suchard,et al.  Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H3N2 , 2014, PLoS pathogens.

[16]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[17]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Geoff Nicholls,et al.  Using Temporally Spaced Sequences to Simultaneously Estimate Migration Rates, Mutation Rate and Population Sizes in Measurably Evolving Populations , 2004, Genetics.

[19]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[20]  John J. Welch,et al.  Ancient Hybridization and an Irish Origin for the Modern Polar Bear Matriline , 2011, Current Biology.

[21]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[22]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[23]  Alexei J. Drummond,et al.  A Stochastic Simulator of Birth–Death Master Equations with Application to Phylodynamics , 2013, Molecular biology and evolution.

[24]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[25]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[26]  Lu Lu,et al.  Determining the Phylogenetic and Phylogeographic Origin of Highly Pathogenic Avian Influenza (H7N3) in Mexico , 2014, PloS one.

[27]  Erik M. Volz,et al.  Complex Population Dynamics and the Coalescent Under Neutrality , 2012, Genetics.

[28]  G. G. Galli,et al.  Prdm5 Regulates Collagen Gene Transcription by Association with RNA Polymerase II in Developing Bone , 2012, PLoS genetics.

[29]  Edward C. Holmes,et al.  Endemic Dengue Associated with the Co-Circulation of Multiple Viral Lineages and Localized Density-Dependent Transmission , 2011, PLoS pathogens.

[30]  J. Wakeley,et al.  THE EFFECTS OF SUBDIVISION ON THE GENETIC DIVERGENCE OF POPULATIONS AND SPECIES , 2000, Evolution; international journal of organic evolution.

[31]  P. Beerli,et al.  A Continuous Method for Gene Flow , 2013, Genetics.

[32]  Yi Guan,et al.  Temporally structured metapopulation dynamics and persistence of influenza A H3N2 virus in humans , 2011, Proceedings of the National Academy of Sciences.

[33]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.