MTML-msBayes: Approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity

BackgroundMTML-msBayes uses hierarchical approximate Bayesian computation (HABC) under a coalescent model to infer temporal patterns of divergence and gene flow across codistributed taxon-pairs. Under a model of multiple codistributed taxa that diverge into taxon-pairs with subsequent gene flow or isolation, one can estimate hyper-parameters that quantify the mean and variability in divergence times or test models of migration and isolation. The software uses multi-locus DNA sequence data collected from multiple taxon-pairs and allows variation across taxa in demographic parameters as well as heterogeneity in DNA mutation rates across loci. The method also allows a flexible sampling scheme: different numbers of loci of varying length can be sampled from different taxon-pairs.ResultsSimulation tests reveal increasing power with increasing numbers of loci when attempting to distinguish temporal congruence from incongruence in divergence times across taxon-pairs. These results are robust to DNA mutation rate heterogeneity. Estimating mean divergence times and testing simultaneous divergence was less accurate with migration, but improved if one specified the correct migration model. Simulation validation tests demonstrated that one can detect the correct migration or isolation model with high probability, and that this HABC model testing procedure was greatly improved by incorporating a summary statistic originally developed for this task (Wakeley's ΨW ). The method is applied to an empirical data set of three Australian avian taxon-pairs and a result of simultaneous divergence with some subsequent gene flow is inferred.ConclusionsTo retain flexibility and compatibility with existing bioinformatics tools, MTML-msBayes is a pipeline software package consisting of Perl, C and R programs that are executed via the command line. Source code and binaries are available for download at http://msbayes.sourceforge.net/ under an open source license (GNU Public License).

[1]  Kevin R. Thornton,et al.  Automating approximate Bayesian computation by local linear regression , 2009, BMC Genetics.

[2]  John Wakeley,et al.  Estimating Divergence Times from Molecular Data on Phylogenetic and Population Genetic Timescales , 2002 .

[3]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[4]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..

[5]  B. Carstens,et al.  An information‐theoretical approach to phylogeography , 2009, Molecular ecology.

[6]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[7]  Molly Przeworski,et al.  Learning about Modes of Speciation by Computational Approaches , 2009, Evolution; international journal of organic evolution.

[8]  Jean-Marie Hombert,et al.  Origins and Genetic Diversity of Pygmy Hunter-Gatherers from Western Central Africa , 2009, Current Biology.

[9]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[10]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[11]  Noah A. Rosenberg,et al.  Demographic History of European Populations of Arabidopsis thaliana , 2008, PLoS genetics.

[12]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[13]  Joanna L. Mountain,et al.  REJECTOR: software for population history inference from genetic data via a rejection algorithm , 2008, Bioinform..

[14]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[15]  Donald B. Rubin,et al.  Validation of Software for Bayesian Models Using Posterior Quantiles , 2006 .

[16]  John Avise Books Received , 2000, Heredity.

[17]  BMC Bioinformatics , 2005 .

[18]  Nancy Knowlton,et al.  A MULTILOCUS TEST OF SIMULTANEOUS DIVERGENCE ACROSS THE ISTHMUS OF PANAMA USING SNAPPING SHRIMP IN THE GENUS ALPHEUS , 2009, Evolution; international journal of organic evolution.

[19]  Shuichi Matsumura,et al.  Simulations, Genetics and Human Prehistory , 2008 .

[20]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[21]  Craig Moritz,et al.  A MULTILOCUS PERSPECTIVE ON REFUGIAL ISOLATION AND DIVERGENCE IN RAINFOREST SKINKS (CARLIA) , 2006, Evolution; international journal of organic evolution.

[22]  H. Saunders,et al.  Probability, Random Variables and Stochastic Processes (2nd Edition) , 1989 .

[23]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .

[24]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[25]  M. Nei,et al.  Gene genealogy and variance of interpopulational nucleotide differences. , 1985, Genetics.

[26]  Franck Jabot,et al.  Measurement of biological information with applications from genes to landscapes , 2006, Molecular ecology.

[27]  Joao S. Lopes,et al.  PopABC: a program to infer historical demographic parameters , 2009, Bioinform..

[28]  G. Bertorelle,et al.  ABC as a flexible framework to estimate demography over space and time: some cons, many pros , 2010, Molecular ecology.

[29]  L. Excoffier,et al.  Statistical evaluation of alternative models of human evolution , 2007, Proceedings of the National Academy of Sciences.

[30]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[31]  M. Nei,et al.  Mathematical model for studying genetic variation in terms of restriction endonucleases. , 1979, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[33]  H A Lessios,et al.  TEST FOR SIMULTANEOUS DIVERGENCE USING APPROXIMATE BAYESIAN COMPUTATION , 2006, Evolution; international journal of organic evolution.

[34]  Daniel Wegmann,et al.  Bayesian Computation and Model Selection Without Likelihoods , 2010, Genetics.

[35]  Jean-Marie Cornuet,et al.  GENETIC ANALYSIS OF COMPLEX DEMOGRAPHIC SCENARIOS: SPATIALLY EXPANDING POPULATIONS OF THE CANE TOAD, BUFO MARINUS , 2004, Evolution; international journal of organic evolution.

[36]  Mark A. Beaumont,et al.  Approximate Bayesian Computation Without Summary Statistics: The Case of Admixture , 2009, Genetics.

[37]  Jean-Marie Cornuet,et al.  Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation , 2008, Bioinform..

[38]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[39]  J. L. Parra,et al.  Phylogenetic structure in tropical hummingbird communities , 2009, Proceedings of the National Academy of Sciences.

[40]  Craig Moritz,et al.  Comparative phylogeography: concepts and applications , 1998 .

[41]  L Lacey Knowles,et al.  Statistical phylogeography. , 2002, Molecular ecology.

[42]  Michael J. Hickerson,et al.  msBayes: Pipeline for testing comparative phylogeographic histories using hierarchical approximate Bayesian computation , 2007, BMC Bioinformatics.

[43]  J. Wakeley,et al.  Distinguishing migration from isolation using the variance of pairwise differences. , 1996, Theoretical population biology.

[44]  J. Wakeley,et al.  The variance of pairwise nucleotide differences in two populations with migration. , 1996, Theoretical population biology.

[45]  A. von Haeseler,et al.  Inference of population history using a likelihood approach. , 1998, Genetics.

[46]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[47]  Laurent Excoffier,et al.  ABCtoolbox: a versatile toolkit for approximate Bayesian computations , 2010, BMC Bioinformatics.

[48]  Mark A. Beaumont,et al.  TESTING FOR GENETIC EVIDENCE OF POPULATION EXPANSION AND CONTRACTION: AN EMPIRICAL ANALYSIS OF MICROSATELLITE DNA VARIATION USING A HIERARCHICAL BAYESIAN MODEL , 2002, Evolution; international journal of organic evolution.

[49]  H. Shaffer,et al.  Annual review of ecology, evolution, and systematics , 2003 .

[50]  Scott V Edwards,et al.  Divergence Across Australia's Carpentarian Barrier: Statistical Phylogeography of the Red-Backed Fairy Wren (Malurus melanocephalus) , 2008, Evolution; international journal of organic evolution.

[51]  Mark A Beaumont,et al.  Statistical inferences in phylogeography , 2009, Molecular ecology.

[52]  Gordon Luikart,et al.  Comparative Evaluation of a New Effective Population Size Estimator Based on Approximate Bayesian Computation , 2004, Genetics.

[53]  Scott V Edwards,et al.  SPECIATIONAL HISTORY OF AUSTRALIAN GRASS FINCHES (POEPHILA) INFERRED FROM THIRTY GENE TREES* , 2005, Evolution; international journal of organic evolution.

[54]  Papoulis A. Probability, random variables, and stochastic processes. New York: McGraw Hill, 1965 , 2004 .

[55]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[56]  James Rosindell,et al.  Unified neutral theory of biodiversity and biogeography , 2010, Scholarpedia.

[57]  Jean-Marie Cornuet,et al.  Bayesian Analysis of an Admixture Model With Mutations and Arbitrarily Linked Markers , 2005, Genetics.

[58]  Christian N. K. Anderson,et al.  Serial SimCoal: A population genetics model for data from multiple populations and points in time , 2005, Bioinform..

[59]  L. Bernatchez,et al.  The genetic architecture of ecological speciation and the association with signatures of selection in natural lake whitefish (Coregonus sp. Salmonidae) species pairs. , 2007, Molecular biology and evolution.

[60]  S. Jeffery Evolution of Protein Molecules , 1979 .

[61]  R. Plevin,et al.  Approximate Bayesian Computation in Evolution and Ecology , 2011 .

[62]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[63]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Michael J. Hickerson,et al.  Stability Predicts Genetic Diversity in the Brazilian Atlantic Forest Hotspot , 2009, Science.

[65]  Mark A Beaumont,et al.  Rapid radiation in spiny lobsters (Palinurus spp) as revealed by classic and ABC methods using mtDNA and microsatellite data , 2009, BMC Evolutionary Biology.

[66]  Bryan C. Carstens,et al.  Phylogeography's past, present, and future: 10 years after Avise, 2000. , 2010, Molecular phylogenetics and evolution.

[67]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[68]  L. Moyle Ecological and Evolutionary Genomics in the Wild Tomatoes (Solanum Sect. Lycopersicon) , 2008, Evolution; international journal of organic evolution.

[69]  M. Beaumont,et al.  Likelihood-Free Inference of Population Structure and Local Adaptation in a Bayesian Hierarchical Model , 2010, Genetics.

[70]  B. Arbogast,et al.  Comparative phylogeography as an integrative approach to historical biogeography , 2001 .

[71]  Mark A. Beaumont,et al.  Joint determination of topology, divergence time, and immigration in population trees , 2008 .

[72]  H. Munro,et al.  Mammalian protein metabolism , 1964 .