Disk covering methods improve phylogenomic analyses

MotivationWith the rapid growth rate of newly sequenced genomes, species tree inference from multiple genes has become a basic bioinformatics task in comparative and evolutionary biology. However, accurate species tree estimation is difficult in the presence of gene tree discordance, which is often due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent. Several highly accurate coalescent-based species tree estimation methods have been developed over the last decade, including MP-EST. However, the running time for MP-EST increases rapidly as the number of species grows.ResultsWe present divide-and-conquer techniques that improve the scalability of MP-EST so that it can run efficiently on large datasets. Surprisingly, this technique also improves the accuracy of species trees estimated by MP-EST, as our study shows on a collection of simulated and biological datasets.

[1]  B. Hallström,et al.  Coalescent-Based Genome Analyses Resolve the Early Branches of the Euarchontoglires , 2013, PloS one.

[2]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[3]  R. Möhring Algorithmic graph theory and perfect graphs , 1986 .

[4]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[5]  B. Faircloth,et al.  Target capture and massively parallel sequencing of ultraconserved elements for comparative studies at shallow evolutionary time scales. , 2013, Systematic biology.

[6]  Daniel H. Huson,et al.  Solving Large Scale Phylogenetic Problems using DCM2 , 1999, ISMB.

[7]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[8]  Tandy J. Warnow,et al.  DACTAL: divide-and-conquer trees (almost) without alignments , 2012, Bioinform..

[9]  C. J-F,et al.  THE COALESCENT , 1980 .

[10]  Tandy J. Warnow,et al.  Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees , 2004, IEEE Computer Society Computational Systems Bioinformatics Conference.

[11]  Yuichiro Hara,et al.  Sister group relationship of turtles to the bird-crocodilian clade revealed by nuclear DNA-coded proteins. , 2005, Molecular biology and evolution.

[12]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[13]  F. Delsuc,et al.  Phylogenomic analyses support the position of turtles as the sister group of birds and crocodiles (Archosauria) , 2012, BMC Biology.

[14]  Michael S. Y. Lee,et al.  Calibration choice, rate smoothing, and the pattern of tetrapod diversification according to the long nuclear gene RAG-1. , 2007, Systematic biology.

[15]  Tandy J. Warnow,et al.  Designing fast converging phylogenetic methods , 2001, ISMB.

[16]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[17]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[18]  Tandy J. Warnow,et al.  Naive binning improves phylogenomic analyses , 2013, Bioinform..

[19]  Tandy J. Warnow,et al.  MRL and SuperFine+MRL: new supertree methods , 2012, Algorithms for Molecular Biology.

[20]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[21]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[22]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[23]  Elchanan Mossel,et al.  Incomplete Lineage Sorting: Consistent Phylogeny Estimation from Multiple Loci , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Tandy J. Warnow,et al.  Absolute convergence: true trees from short sequences , 2001, SODA '01.

[25]  Zhen Yan,et al.  Origin of land plants using the multispecies coalescent model. , 2013, Trends in plant science.

[26]  Sen Song,et al.  Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model , 2012, Proceedings of the National Academy of Sciences.

[27]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[28]  Bernard M. E. Moret,et al.  Rec-I-DCM3: a fast algorithmic technique for reconstructing phylogenetic trees , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[29]  Colin N. Dewey,et al.  BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis , 2010, Bioinform..

[30]  A. Meyer,et al.  Complete mitochondrial genome suggests diapsid affinities of turtles. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.