Efficient Bayesian inference under the structured coalescent

Motivation: Population structure significantly affects evolutionary dynamics. Such structure may be due to spatial segregation, but may also reflect any other gene-flow-limiting aspect of a model. In combination with the structured coalescent, this fact can be used to inform phylogenetic tree reconstruction, as well as to infer parameters such as migration rates and subpopulation sizes from annotated sequence data. However, conducting Bayesian inference under the structured coalescent is impeded by the difficulty of constructing Markov Chain Monte Carlo (MCMC) sampling algorithms (samplers) capable of efficiently exploring the state space. Results: In this article, we present a new MCMC sampler capable of sampling from posterior distributions over structured trees: timed phylogenetic trees in which lineages are associated with the distinct subpopulation in which they lie. The sampler includes a set of MCMC proposal functions that offer significant mixing improvements over a previously published method. Furthermore, its implementation as a BEAST 2 package ensures maximum flexibility with respect to model and prior specification. We demonstrate the usefulness of this new sampler by using it to infer migration rates and effective population sizes of H3N2 influenza between New Zealand, New York and Hong Kong from publicly available hemagglutinin (HA) gene sequences under the structured coalescent. Availability and implementation: The sampler has been implemented as a publicly available BEAST 2 package that is distributed under version 3 of the GNU General Public License at http://compevol.github.io/MultiTypeTree. Contact: tgvaughan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[2]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[3]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[4]  Tanja Stadler,et al.  Lineages-through-time plots of neutral models for speciation. , 2008, Mathematical biosciences.

[5]  Alexei J. Drummond,et al.  Phylogenetic and epidemic modeling of rapidly evolving infectious diseases , 2011, Infection, Genetics and Evolution.

[6]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[7]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[8]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[9]  Alexei J. Drummond,et al.  A Stochastic Simulator of Birth–Death Master Equations with Application to Phylodynamics , 2013, Molecular biology and evolution.

[10]  Cleve B. Moler,et al.  Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later , 1978, SIAM Rev..

[11]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[12]  M. Volz Erik,et al.  A gene genealogy illustrating internode intervals. , 2013 .

[13]  David L. Suarez Influenza A virus , 2009 .

[14]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[15]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[16]  O. Pybus,et al.  Unifying the Epidemiological and Evolutionary Dynamics of Pathogens , 2004, Science.

[17]  Kathy Hancock Influenza A Virus , 2020, Definitions.

[18]  Trevor Bedford,et al.  Viral Phylodynamics , 2013, PLoS Comput. Biol..

[19]  Geoff Nicholls,et al.  Using Temporally Spaced Sequences to Simultaneously Estimate Migration Rates, Mutation Rate and Population Sizes in Measurably Evolving Populations , 2004, Genetics.

[20]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M. Notohara,et al.  The coalescent and the genealogical process in geographically structured population , 1990, Journal of mathematical biology.

[22]  J. Kingman On the genealogy of large populations , 1982 .

[23]  Colin A. Russell,et al.  The Global Circulation of Seasonal Influenza A (H3N2) Viruses , 2008, Science.

[24]  M. Pascual,et al.  Global Migration Dynamics Underlie Evolution and Persistence of Human Influenza A (H3N2) , 2010, PLoS pathogens.

[25]  P. Fearnhead,et al.  An exact Gibbs sampler for the Markov‐modulated Poisson process , 2006 .

[26]  Erik M. Volz,et al.  Complex Population Dynamics and the Coalescent Under Neutrality , 2012, Genetics.

[27]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[28]  Xiaodong Cai,et al.  Efficient exact and K-skip methods for stochastic simulation of coupled chemical reactions. , 2009, The Journal of chemical physics.

[29]  C. Viboud,et al.  Explorer The genomic and epidemiological dynamics of human influenza A virus , 2016 .

[30]  Carsten Wiuf,et al.  Gene Genealogies, Variation and Evolution - A Primer in Coalescent Theory , 2004 .

[31]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[32]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[33]  Hervé Philippe,et al.  Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models , 2008, Bioinform..

[34]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[35]  Peter Beerli,et al.  Comparison of Bayesian and maximum-likelihood inference of population genetic parameters , 2006, Bioinform..

[36]  John R Pannell COALESCENCE IN A METAPOPULATION WITH RECURRENT LOCAL EXTINCTION AND RECOLONIZATION , 2003, Evolution; international journal of organic evolution.

[37]  J. Felsenstein,et al.  Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. , 1999, Genetics.

[38]  D. Gillespie A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions , 1976 .

[39]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[40]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[41]  M. Sanderson Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. , 2002, Molecular biology and evolution.