MASCOT: Parameter and state inference under the marginal structured coalescent approximation

Motivation The structured coalescent is widely applied to study demography within and migration between sub-populations from genetic sequence data. Current methods are either exact but too computationally inefficient to analyse large datasets with many states, or make strong approximations leading to severe biases in inference. We recently introduced an approximation based on weaker assumptions to the structured coalescent enabling the analysis of larger datasets with many different states. We showed that our approximation provides unbiased migration rate and population size estimates across a wide parameter range. Results We here extend this approach by providing a new algorithm to calculate the probability of the state of internal nodes that includes the information from the full phylogenetic tree. We show that this algorithm is able to increase the probability attributed to the true node states. Furthermore we use improved integration techniques, such that our method is now able to analyse larger datasets, including a H3N2 dataset with 433 sequences sampled from 5 different locations. Availability The here presented methods are combined into the BEAST2 package MASCOT, the Marginal Approximation of the Structured COalescenT. This package can be downloaded via the BEAUti package manager. The source code is available at https://github.com/nicfel/Mascot.git.

[1]  Tanja Stadler,et al.  Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2]  Liam J. Revell,et al.  phytools: an R package for phylogenetic comparative biology (and other things) , 2012 .

[3]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[4]  Tanja Stadler,et al.  The Structured Coalescent and Its Approximations , 2016, bioRxiv.

[5]  Geoff Nicholls,et al.  Using Temporally Spaced Sequences to Simultaneously Estimate Migration Rates, Mutation Rate and Population Sizes in Measurably Evolving Populations , 2004, Genetics.

[6]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Colin A. Russell,et al.  The Global Circulation of Seasonal Influenza A (H3N2) Viruses , 2008, Science.

[8]  David Welch,et al.  Efficient Bayesian inference under the structured coalescent , 2014, Bioinform..

[9]  Nicola De Maio,et al.  New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation , 2015, PLoS genetics.

[10]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[11]  N. Takahata,et al.  The coalescent in two partially isolated diffusion populations. , 1988, Genetical research.

[12]  O. Pybus,et al.  The Epidemic Behavior of the Hepatitis C Virus , 2001, Science.

[13]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[14]  Erik M. Volz,et al.  Complex Population Dynamics and the Coalescent Under Neutrality , 2012, Genetics.

[15]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[16]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[17]  Alexei J. Drummond,et al.  A Stochastic Simulator of Birth–Death Master Equations with Application to Phylodynamics , 2013, Molecular biology and evolution.

[18]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[19]  M. Suchard,et al.  Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H3N2 , 2014, PLoS pathogens.

[20]  M. Notohara,et al.  The coalescent and the genealogical process in geographically structured population , 1990, Journal of mathematical biology.

[21]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..