MASCOT: parameter and state inference under the marginal structured coalescent approximation

Motivation The structured coalescent is widely applied to study demography within and migration between sub‐populations from genetic sequence data. Current methods are either exact but too computationally inefficient to analyse large datasets with many sub‐populations, or make strong approximations leading to severe biases in inference. We recently introduced an approximation based on weaker assumptions to the structured coalescent enabling the analysis of larger datasets with many different states. We showed that our approximation provides unbiased migration rate and population size estimates across a wide parameter range. Results We extend this approach by providing a new algorithm to calculate the probability of the state of internal nodes that includes the information from the full phylogenetic tree. We show that this algorithm is able to increase the probability attributed to the true sub‐population of a node. Furthermore we use improved integration techniques, such that our method is now able to analyse larger datasets, including a H3N2 dataset with 433 sequences sampled from five different locations. Availability and implementation The presented methods are part of the BEAST2 package MASCOT, the Marginal Approximation of the Structured COalescenT. This package can be downloaded via the BEAUti package manager. The source code is available at https://github.com/nicfel/Mascot.git. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Alexei J. Drummond,et al.  A Stochastic Simulator of Birth–Death Master Equations with Application to Phylodynamics , 2013, Molecular biology and evolution.

[2]  M. Notohara,et al.  The coalescent and the genealogical process in geographically structured population , 1990, Journal of mathematical biology.

[3]  Colin A. Russell,et al.  The Global Circulation of Seasonal Influenza A (H3N2) Viruses , 2008, Science.

[4]  Marc A Suchard,et al.  Fast, accurate and simulation-free stochastic mapping , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[5]  David Welch,et al.  Efficient Bayesian inference under the structured coalescent , 2014, Bioinform..

[6]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[7]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[8]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Nicola De Maio,et al.  New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation , 2015, PLoS genetics.

[10]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[11]  Marc A Suchard,et al.  Counting labeled transitions in continuous-time Markov models of evolution , 2007, Journal of mathematical biology.

[12]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[13]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[14]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[15]  Erik M. Volz,et al.  Complex Population Dynamics and the Coalescent Under Neutrality , 2012, Genetics.

[16]  N. Takahata,et al.  The coalescent in two partially isolated diffusion populations. , 1988, Genetical research.

[17]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[18]  Tanja Stadler,et al.  Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19]  Liam J. Revell,et al.  phytools: an R package for phylogenetic comparative biology (and other things) , 2012 .

[20]  Geoff Nicholls,et al.  Using Temporally Spaced Sequences to Simultaneously Estimate Migration Rates, Mutation Rate and Population Sizes in Measurably Evolving Populations , 2004, Genetics.

[21]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[22]  O. Pybus,et al.  The Epidemic Behavior of the Hepatitis C Virus , 2001, Science.

[23]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[24]  Tanja Stadler,et al.  The Structured Coalescent and Its Approximations , 2016, bioRxiv.

[25]  M. Suchard,et al.  Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H3N2 , 2014, PLoS pathogens.