Quartet MaxCut: a fast algorithm for amalgamating quartet trees.

Accurate phylogenetic reconstruction methods are inherently computationally heavy and therefore are limited to relatively small numbers of taxa. Supertree construction is the task of amalgamating small trees over partial sets into a big tree over the complete taxa set. The need for fast and accurate supertree methods has become crucial due to the enormous number of new genomic sequences generated by modern technology and the desire to use them for classification purposes. In particular, the Assembling the Tree of Life (ATOL) program aims at constructing the evolutionary history of all living organisms on Earth. When dealing with unrooted trees, a quartet - an unrooted tree over four taxa - is the most basic piece of phylogenetic information. Therefore, quartet amalgamation stands at the heart of any supertree problem as it concerns combining many minimal pieces of information into a single, coherent, and more comprehensive piece of information. We have devised an extremely fast algorithm for quartet amalgamation and implemented it in a very efficient code. The new code can handle over a hundred millions of quartet trees over several hundreds of taxa with very high accuracy.

[1]  Sagi Snir,et al.  Fast and reliable reconstruction of phylogenetic trees with very short edges , 2008, SODA '08.

[2]  K. Strimmer,et al.  Bayesian Probabilities and Quartet Puzzling , 1997 .

[3]  Satish Rao,et al.  Quartets MaxCut: A Divide and Conquer Quartets Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Timothy J. Harlow,et al.  Highways of gene sharing in prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Sagi Snir,et al.  Molecular clock fork phylogenies: closed form analytic maximum likelihood solutions. , 2004, Systematic biology.

[6]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[7]  M. Casanellas,et al.  Performance of a new invariants method on homogeneous and nonhomogeneous quartet trees. , 2006, Molecular biology and evolution.

[8]  M. Ragan,et al.  Matrix representation in reconstructing phylogenetic relationships among the eukaryotes. , 1992, Bio Systems.

[9]  M. Nei,et al.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. , 2007, Molecular biology and evolution.

[10]  W. Doolittle,et al.  Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. , 2006, Genome research.

[11]  David Fernández-Baca,et al.  Performance of flip supertree construction with a heuristic algorithm. , 2004, Systematic biology.

[12]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[13]  Barbara R. Holland,et al.  Multiple maxima of likelihood in phylogenetic trees: an analytic approach , 2000, RECOMB '00.

[14]  Sagi Snir,et al.  Maximum Likelihood Molecular Clock Comb: Analytic Solutions , 2006, J. Comput. Biol..

[15]  Zhenshui Zhang,et al.  Distinct Types of rRNA Operons Exist in the Genome of the Actinomycete Thermomonospora chromogena and Evidence for Horizontal Transfer of an Entire rRNA Operon , 1999, Journal of bacteriology.

[16]  O. Gascuel,et al.  Quartet-based phylogenetic inference: improvements and limits. , 2001, Molecular biology and evolution.

[17]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[18]  L. Pachter,et al.  Algebraic Statistics for Computational Biology: Preface , 2005 .

[19]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[20]  Anton van den Hengel,et al.  Semidefinite Programming , 2014, Computer Vision, A Reference Guide.

[21]  Satish Rao,et al.  Using Max Cut to Enhance Rooted Trees Consistency , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Alan Frieze,et al.  Random Structures and Algorithms , 2014 .

[23]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[24]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[25]  A. R. Wagner Molecular Biology and Evolution , 2001 .

[26]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[27]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[28]  J. Gogarten,et al.  Rooting the ribosomal tree of life. , 2010, Molecular biology and evolution.

[29]  Olivier Gascuel,et al.  Inferring evolutionary trees with strong combinatorial evidence , 2000, Theor. Comput. Sci..

[30]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[31]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[32]  W. Doolittle,et al.  Prokaryotic evolution in light of gene transfer. , 2002, Molecular biology and evolution.

[33]  E. Koonin,et al.  Horizontal gene transfer in prokaryotes: quantification and classification. , 2001, Annual review of microbiology.

[34]  Stephen J. Willson,et al.  Building Phylogenetic Trees from Quartets by Using Local Inconsistency Measures , 1999 .

[35]  Ron Shamir,et al.  Detecting Highways of Horizontal Gene Transfer , 2011, J. Comput. Biol..

[36]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[37]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[38]  Seth Sullivant,et al.  Algebraic Statistics for Computational Biology: Catalog of Small Trees , 2005 .

[39]  Andy Purvis,et al.  A species-level phylogenetic supertree of marsupials , 2004 .

[40]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[41]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[42]  Satish Rao,et al.  Short Quartet Puzzling: A New Quartet-Based Phylogeny Reconstruction Algorithm , 2008, J. Comput. Biol..

[43]  David Sankoff Source Minimal Mutation Trees of Sequences Author ( s ) : , 2010 .

[44]  Tandy J. Warnow,et al.  An Experimental Study of Quartets MaxCut and Other Supertree Methods , 2010, WABI.

[45]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[46]  Andy Purvis,et al.  Phylogenetic supertrees: Assembling the trees of life. , 1998, Trends in ecology & evolution.

[47]  B. Chor,et al.  Multiple maxima of likelihood in phylogenetic trees: an analytic approach , 2000, RECOMB '00.

[48]  T. Tuller,et al.  Inferring phylogenetic networks by the maximum parsimony criterion: a case study. , 2006, Molecular biology and evolution.

[49]  W. Doolittle,et al.  Lateral gene transfer , 2011, Current Biology.

[50]  B. Baum Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees , 1992 .

[51]  Zhixiong Xie,et al.  Horizontal Gene Transfer , 2003, Methods in Molecular Biology.

[52]  Raphael Yuster,et al.  Reconstructing approximate phylogenetic trees from quartet samples , 2010, SODA '10.

[53]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[54]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.