New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation

Phylogeographic methods aim to infer migration trends and the history of sampled lineages from genetic data. Applications of phylogeography are broad, and in the context of pathogens include the reconstruction of transmission histories and the origin and emergence of outbreaks. Phylogeographic inference based on bottom-up population genetics models is computationally expensive, and as a result faster alternatives based on the evolution of discrete traits have become popular. In this paper, we show that inference of migration rates and root locations based on discrete trait models is extremely unreliable and sensitive to biased sampling. To address this problem, we introduce BASTA (BAyesian STructured coalescent Approximation), a new approach implemented in BEAST2 that combines the accuracy of methods based on the structured coalescent with the computational efficiency required to handle more than just few populations. We illustrate the potentially severe implications of poor model choice for phylogeographic analyses by investigating the zoonotic transmission of Ebola virus. Whereas the structured coalescent analysis correctly infers that successive human Ebola outbreaks have been seeded by a large unsampled non-human reservoir population, the discrete trait analysis implausibly concludes that undetected human-to-human transmission has allowed the virus to persist over the past four decades. As genomics takes on an increasingly prominent role informing the control and prevention of infectious diseases, it will be vital that phylogeographic inference provides robust insights into transmission history.

[1]  H. Wilkinson-Herbots,et al.  Genealogy and subpopulation differentiation under various models of population structure , 1998 .

[2]  M. Notohara,et al.  The coalescent and the genealogical process in geographically structured population , 1990, Journal of mathematical biology.

[3]  C. Sing,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. IV. Nested analyses with cladogram uncertainty and recombination. , 1993, Genetics.

[4]  Peter Beerli,et al.  Comparison of Bayesian and maximum-likelihood inference of population genetic parameters , 2006, Bioinform..

[5]  F. Aarestrup,et al.  Livestock Origin for a Human Pandemic Clone of Community-Associated Methicillin-Resistant Staphylococcus aureus , 2013, mBio.

[6]  H. Ota,et al.  Accommodating heterogenous rates of evolution in molecular divergence dating methods: an example using intercontinental dispersal of Plestiodon (Eumeces) lizards. , 2011, Systematic biology.

[7]  David L. Smith,et al.  Mapping the zoonotic niche of Ebola virus disease in Africa , 2014, eLife.

[8]  Alan R Templeton,et al.  Coalescent‐based, maximum likelihood inference in phylogeography , 2010, Molecular ecology.

[9]  J. Felsenstein,et al.  Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. , 1999, Genetics.

[10]  Julian Parkhill,et al.  Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study , 2014, The Lancet. Infectious diseases.

[11]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[12]  Lu Lu,et al.  Determining the Phylogenetic and Phylogeographic Origin of Highly Pathogenic Avian Influenza (H7N3) in Mexico , 2014, PloS one.

[13]  Alexei J. Drummond,et al.  Phylogenetic and epidemic modeling of rapidly evolving infectious diseases , 2011, Infection, Genetics and Evolution.

[14]  D. Richman,et al.  HIV migration between blood and cerebrospinal fluid or semen over time. , 2014, The Journal of infectious diseases.

[15]  Hilde Maria Jozefa Dominiek Herbots,et al.  Stochastic Models in Population Genetics: Genealogy and Genetic Differentiation in Structured Populations. , 1994 .

[16]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[17]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[18]  M. Kimura,et al.  The Stepping Stone Model of Population Structure and the Decrease of Genetic Correlation with Distance. , 1964, Genetics.

[19]  Rachel S. G. Sealfon,et al.  Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak , 2014, Science.

[20]  C. J-F,et al.  THE COALESCENT , 1980 .

[21]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo in Practice: A Roundtable Discussion , 1998 .

[22]  Marc A Suchard,et al.  Three roads diverged? Routes to phylogeographic inference. , 2010, Trends in ecology & evolution.

[23]  J. Gonzalez,et al.  Fruit bats as reservoirs of Ebola virus , 2005, Nature.

[24]  Paolo Piazza,et al.  Microevolutionary analysis of Clostridium difficile genomes to investigate transmission , 2012, Genome Biology.

[25]  Erik M. Volz,et al.  Complex Population Dynamics and the Coalescent Under Neutrality , 2012, Genetics.

[26]  A. Barrett,et al.  Phylogeography of West Nile Virus: from the Cradle of Evolution in Africa to Eurasia, Australia, and the Americas , 2010, Journal of Virology.

[27]  Marc A Suchard,et al.  Phylogeography and population dynamics of dengue viruses in the Americas. , 2012, Molecular biology and evolution.

[28]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[29]  Jianquan Liu,et al.  Phylogeographic studies of plants in China: Advances in the past and directions in the future , 2012 .

[30]  M. Suchard,et al.  Phylogeographical footprint of colonial history in the global dispersal of human immunodeficiency virus type 2 group A. , 2012, The Journal of general virology.

[31]  Greg Ewing,et al.  Estimating Population Parameters using the Structured Serial Coalescent with Bayesian MCMC Inference when some Demes are Hidden , 2006, Evolutionary bioinformatics online.

[32]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Geoff Nicholls,et al.  Using Temporally Spaced Sequences to Simultaneously Estimate Migration Rates, Mutation Rate and Population Sizes in Measurably Evolving Populations , 2004, Genetics.

[34]  M. Pagel The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters on Phylogenies , 1999 .

[35]  E. Boerwinkle,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. I. Basic theory and an analysis of alcohol dehydrogenase activity in Drosophila. , 1987, Genetics.

[36]  David A. Rasmussen,et al.  Phylodynamic Inference for Structured Epidemiological Models , 2014, PLoS Comput. Biol..

[37]  Todd H. Oakley,et al.  Reconstructing ancestral character states: a critical reappraisal. , 1998, Trends in ecology & evolution.

[38]  L L Cavalli-Sforza,et al.  A migration matrix model for the study of random genetic drift. , 1968, Genetics.

[39]  Erik Axelsson,et al.  Ancient DNA analyses exclude humans as the driving force behind late Pleistocene musk ox (Ovibos moschatus) population dynamics , 2010, Proceedings of the National Academy of Sciences.

[40]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[41]  C. Drummond,et al.  Multiple continental radiations and correlates of diversification in Lupinus (Leguminosae): testing for key innovation with incomplete taxon sampling. , 2012, Systematic biology.

[42]  M. Suchard,et al.  Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H3N2 , 2014, PLoS pathogens.

[43]  Jody Hey,et al.  The study of structured populations — new hope for a difficult and divided science , 2003, Nature Reviews Genetics.

[44]  David Welch,et al.  Efficient Bayesian inference under the structured coalescent , 2014, Bioinform..

[45]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[46]  Jukka Corander,et al.  In defence of model‐based inference in phylogeography , 2010, Molecular ecology.

[47]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[48]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[49]  Gregory Ewing,et al.  MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus , 2010, Bioinform..

[50]  P. Lemey,et al.  The Spread of Tomato Yellow Leaf Curl Virus from the Middle East to the World , 2010, PLoS pathogens.

[51]  M. Suchard,et al.  Phylogeography takes a relaxed random walk in continuous space and time. , 2010, Molecular biology and evolution.

[52]  John J. Welch,et al.  Ancient Hybridization and an Irish Origin for the Modern Polar Bear Matriline , 2011, Current Biology.

[53]  C. Sing,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. , 1992, Genetics.