PipeMaster: inferring population divergence and demographic history with approximate Bayesian computation and supervised machine-learning in R

Understanding population divergence involves testing diversification scenarios and estimating historical parameters, such as divergence time, population size and migration rate. There is, however, an immense space of possible highly parameterized scenarios that are difsficult or impossible to solve analytically. To overcome this problem researchers have used alternative simulation-based approaches, such as approximate Bayesian computation (ABC) and supervised machine learning (SML), to approximate posterior probabilities of hypotheses. In this study we demonstrate the utility of our newly developed R-package to simulate summary statistics to perform ABC and SML inferences. We compare the power of both ABC and SML methods and the influence of the number of loci in the accuracy of inferences; and we show three empirical examples: (i) the Muller’s termite frog genomic data from Southamerica; (ii) the cottonmouth and (iii) and the copperhead snakes sanger data from Northamerica. We found that SML is more efficient than ABC. It is generally more accurate and needs fewer simulations to perform an inference. We found support for a divergence model without migration, with a recent bottleneck for one of the populations of the southamerican frog. For the cottonmouth we found support for divergence with migration and recent expansion and for the copperhead we found support for a model of divergence with migration and recent bottleneck. Interestingly, by using an SML method it was possible to achieve high accuracy in model selection even when several models were compared in a single inference. We also found a higher accuracy when inferring parameters with SML.

[1]  J. Huxley,et al.  Systematics and the Origin of Species from the Viewpoint of a Zoologist , 1943 .

[2]  C. J-F,et al.  THE COALESCENT , 1980 .

[3]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.

[4]  Mandev S. Gill,et al.  Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. , 2013, Molecular biology and evolution.

[5]  M. Rodrigues,et al.  Phylogeography and historical demography of the arboreal pit viper Bothrops bilineatus (Serpentes, Crotalinae) reveal multiple connections between Amazonian and Atlantic rain forests , 2018, Journal of Biogeography.

[6]  R. Plevin,et al.  Approximate Bayesian Computation in Evolution and Ecology , 2011 .

[7]  Marcelo Gehara,et al.  Phylogeography of Muller's termite frog suggests the vicariant role of the Central Brazilian Plateau , 2018, Journal of Biogeography.

[8]  C. Peichel,et al.  Ecological selection against hybrids in natural populations of sympatric threespine sticklebacks , 2007, Journal of evolutionary biology.

[9]  Peter Beerli,et al.  Unified Framework to Evaluate Panmixia and Migration Direction Among Multiple Sampling Locations , 2010, Genetics.

[10]  D. Richards,et al.  Quaternary ecological and geomorphic changes associated with rainfall events in presently semi‐arid northeastern Brazil , 2004 .

[11]  B. Rannala,et al.  Bayesian species delimitation using multilocus sequence data , 2010, Proceedings of the National Academy of Sciences.

[12]  F. Burbrink,et al.  Demographic and phylogeographic histories of two venomous North American snakes of the genus Agkistrodon. , 2008, Molecular phylogenetics and evolution.

[13]  Gregory B. Ewing,et al.  PopPlanner: visually constructing demographic models for simulation , 2015, Front. Genet..

[14]  Marcelo Gehara,et al.  The Biogeography of Deep Time Phylogenetic Reticulation , 2018, Systematic biology.

[15]  M. Lercher,et al.  PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R , 2014, Molecular biology and evolution.

[16]  M. Granatosky,et al.  Pliocene-Pleistocene lineage diversifications in the Eastern Indigo Snake (Drymarchon couperi) in the Southeastern United States. , 2016, Molecular phylogenetics and evolution.

[17]  Yun S. Song,et al.  Deep Learning for Population Genetic Inference , 2015, bioRxiv.

[18]  F. Burbrink,et al.  Considering gene flow when using coalescent methods to delimit lineages of North American pitvipers of the genus Agkistrodon , 2015 .

[19]  F. Burbrink,et al.  Ecological divergence in the yellow-bellied kingsnake (Lampropeltis calligaster) at two North American biodiversity hotspots. , 2017, Molecular phylogenetics and evolution.

[20]  J. Hey Isolation with migration models for more than two populations. , 2010, Molecular biology and evolution.

[21]  Daniel R. Schrider,et al.  Supervised Machine Learning for Population Genetics: A New Paradigm , 2018, Trends in genetics : TIG.

[22]  Mark A Beaumont,et al.  Statistical inferences in phylogeography , 2009, Molecular ecology.

[23]  D. Rossa-Feres,et al.  Burrowing behavior of Dermatonotus muelleri (Anura, Microhylidae) with reference to the origin of the burrowing behavior of Anura , 2008, Journal of Ethology.

[24]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[25]  Kurt Hornik,et al.  Implementing a Class of Permutation Tests: The coin Package , 2008 .

[26]  H. K. Gloyd,et al.  Snakes of the Agkistrodon Complex: A Monographic Review , 1990 .

[27]  F. Burbrink,et al.  Phylogeography across a continent: the evolutionary and demographic history of the North American racer (Serpentes: Colubridae: Coluber constrictor). , 2008, Molecular phylogenetics and evolution.

[28]  R. Edwards,et al.  Climate change patterns in Amazonia and biodiversity , 2013, Nature Communications.

[29]  Ashley B. Morris,et al.  Comparative phylogeography of unglaciated eastern North America , 2006, Molecular ecology.

[30]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[31]  Kirk E. Lohmueller,et al.  Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms , 2018, Annual Review of Ecology, Evolution, and Systematics.

[32]  L. Excoffier,et al.  Statistical evaluation of alternative models of human evolution , 2007, Proceedings of the National Academy of Sciences.

[33]  A. Siepel,et al.  Bayesian inference of ancient human demography from individual genome sequences , 2011, Nature Genetics.

[34]  E. A. Myers,et al.  Asynchronous demographic responses to Pleistocene climate change in Eastern Nearctic vertebrates. , 2016, Ecology letters.

[35]  W. Stephan,et al.  msABC: a modification of Hudson’s ms to facilitate multi‐locus ABC analysis , 2010, Molecular ecology resources.

[36]  Marcelo Gehara,et al.  Estimating synchronous demographic changes across populations using hABC and its application for a herpetological community from northeastern Brazil , 2017, Molecular ecology.

[37]  S. Marshall,et al.  North American Ice Sheet reconstructions at the Last Glacial Maximum , 2002 .

[38]  A. Carnaval,et al.  A mid‐Pleistocene rainforest corridor enabled synchronous invasions of the Atlantic Forest by Amazonian anole lizards , 2016, Molecular ecology.

[39]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[40]  Richard R. Hudson,et al.  ms ­ a program for generating samples under neutral models , 2004 .

[41]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..