PoMo: An Allele Frequency-Based Approach for Species Tree Estimation

Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches.

[1]  Yufeng Wu,et al.  COALESCENT‐BASED SPECIES TREE INFERENCE FROM GENE TREE TOPOLOGIES UNDER INCOMPLETE LINEAGE SORTING BY MAXIMUM LIKELIHOOD , 2012, Evolution; international journal of organic evolution.

[2]  Liang Liu,et al.  Estimating Species Trees Using Multiple-Allele DNA Sequence Data , 2008, Evolution; international journal of organic evolution.

[3]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[4]  Laura S Kubatko,et al.  Estimating species trees using approximate Bayesian computation. , 2011, Molecular phylogenetics and evolution.

[5]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[6]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[7]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[8]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[9]  A. Futschik,et al.  PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals , 2011, PloS one.

[10]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[11]  Sudhindra R Gadagkar,et al.  Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. , 2005, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[12]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[13]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[14]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[15]  E. Thompson,et al.  A two-stage pruning algorithm for likelihood computation for a population tree. , 2008, Genetics.

[16]  Elchanan Mossel,et al.  Incomplete Lineage Sorting: Consistent Phylogeny Estimation from Multiple Loci , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  L Lacey Knowles,et al.  Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. , 2009, Systematic biology.

[18]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[19]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.

[20]  J. Corander,et al.  Reconstructing population histories from single nucleotide polymorphism data. , 2011, Molecular biology and evolution.

[21]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[22]  Gregory Ewing,et al.  MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus , 2010, Bioinform..

[23]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[24]  A. Roychoudhury Composite likelihood-based inferences on genetic data from dependent loci , 2011, Journal of mathematical biology.

[25]  Christian Schlötterer,et al.  Linking Great Apes Genome Evolution across Time Scales Using Polymorphism-Aware Phylogenetic Models , 2013, Molecular biology and evolution.

[26]  Bruce Rannala,et al.  The accuracy of species tree estimation under simulation: a comparison of methods. , 2011, Systematic biology.

[27]  Qixin He,et al.  Full modeling versus summarizing gene-tree uncertainty: method choice and species-tree accuracy. , 2012, Molecular phylogenetics and evolution.

[28]  A. Hobolth,et al.  Ancestral Population Genomics: The Coalescent Hidden Markov Model Approach , 2009, Genetics.

[29]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[30]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[31]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[32]  Bryan C Carstens,et al.  Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. , 2007, Systematic biology.

[33]  Manolis Kellis,et al.  Unified modeling of gene duplication, loss, and coalescence using a locus tree. , 2012, Genome research.

[34]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[35]  C. Wiuf Consistency of estimators of population scaled parameters using composite likelihood , 2006, Journal of mathematical biology.

[36]  Qixin He,et al.  Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. , 2010, Systematic biology.

[37]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[38]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[39]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[40]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[41]  M. Gouy,et al.  Genome-scale coestimation of species and gene trees , 2013, Genome research.

[42]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[43]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[44]  Hayley C. Lanier,et al.  Is recombination a problem for species-tree analyses? , 2012, Systematic biology.

[45]  Colin N. Dewey,et al.  BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis , 2010, Bioinform..

[46]  David Bryant,et al.  Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. , 2009, Molecular biology and evolution.

[47]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[48]  Arcadi Navarro,et al.  Great ape genetic diversity and population history , 2013, Nature.

[49]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[50]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[51]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[52]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[53]  John E McCormack,et al.  Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design. , 2009, Systematic biology.