The future of supertrees : bridging the gap with supermatrices 1

The supertree and supermatrix frameworks have been cast as mutually exclusive approaches toward the problem of large-scale phylogenetic inference. Despite often coming under severe criticism, the supertree approach has proven superior to date at deriving comprehensive phylogenetic estimates for many groups (e. g., mammals as a whole) because by combining trees instead of characters, it is able to include more of the global phylogenetic database. The continued rapid growth in sequencing technologies, however, means that this advantage is time-limited given that abundant sequence information will rapidly become available for many groups. What then does such a future hold for the supertree approach? In this paper, I argue that the supertree framework could continue to have a place in phylogenetic inference, albeit altered to play a subordinate role as part of a divide-and-conquer heuristic search strategy for large molecular supermatrices. However, the divide-and-conquer approach has yet to realize its theoretical advantages (in terms of both speed and accuracy) over more conventional heuristic search strategies. I discuss two potential supertree-related bottlenecks that appear to be limiting the performance of the divide-andconquer approach and which can be viewed as problems for which solutions need to be sought. K e y w o r d s : Supertree, supermatrix, phylogenetic inference. Z u s a m m e n f a s s u n g Supertreeund Supermatrix-Systeme entstanden als sich gegenseitig ausschließende Analysemethoden des Problems umfangreicher phylogenetischer Untersuchungen. Trotz zahlreicher Kritik hat sich die Supertree-Methode als überaus erfolgreich beim Erstellen sehr großer phylogenetischer Datensätze (z. B. Säugetiere) erwiesen, da durch die Kombination von Stammbäumen statt von Merkmalen Informationen aus der globalen phylogenetischen Datenbank einbezogen werden können. Dieser methodische Vorteil nimmt jedoch in dem Maß ab, wie die Datenmenge genetischer Sequenzen zunimmt, die in kürzester Zeit für immer mehr Gruppen zur Verfügung stehen. Was bedeutet diese Zunahme an Sequenzdaten für die Zukunft der Supertree-Analyse? Im Folgenden argumentiere ich, dass Supertrees für phylogenetische Analysen weiterhin relevant bleiben, wenngleich sie nur noch eine untergeordnete Rolle als Teil eines Divide-and-Conquer-heuristischen Verfahrens für große molekulare Supermatrices spielen werden. Die zumindest theoretisch vorhandenen Vorteile (wie z. B. Geschwindigkeit und Genauigkeit) dieser Divide-and-Conquer Verfahren gegenüber konventionellen heuristischen Methoden müssen sich jedoch in der Praxis noch bewähren. Im Folgenden diskutiere ich zwei mögliche, durch die Supertree-Analyse entstehende Probleme, welche die Effi zienz von Divide-and-Conquer-Verfahren limitieren, und zu deren Lösung neue Strategien entwickelt werden müssen.

[1]  J. Gatesy,et al.  The supermatrix approach to systematics. , 2007, Trends in ecology & evolution.

[2]  O. Bininda-Emonds,et al.  The evolution of supertrees. , 2004, Trends in ecology & evolution.

[3]  J. L. Gittleman,et al.  A complete phylogeny of the whales, dolphins and even‐toed hoofed mammals (Cetartiodactyla) , 2005, Biological reviews of the Cambridge Philosophical Society.

[4]  Charles Semple,et al.  A supertree method for rooted trees , 2000, Discret. Appl. Math..

[5]  John Gatesy,et al.  Inconsistencies in arguments for the supertree approach: supermatrices versus supertrees of Crocodylia. , 2004, Systematic biology.

[6]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  T. Fulton,et al.  Molecular phylogeny of the Arctoidea (Carnivora): effect of missing data on supertree and supermatrix analyses of multiple gene data sets. , 2006, Molecular phylogenetics and evolution.

[9]  O. Bininda-Emonds,et al.  Trees versus characters and the supertree/supermatrix "paradox". , 2004, Systematic biology.

[10]  M J Sanderson,et al.  Assessment of the accuracy of matrix representation with parsimony analysis supertree construction. , 2001, Systematic biology.

[11]  M Steel,et al.  Simple but fundamental limitations on supertree and consensus tree methods. , 2000, Systematic biology.

[12]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[13]  B. Baum Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees , 1992 .

[14]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[15]  Tandy J. Warnow,et al.  Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees , 2004, IEEE Computer Society Computational Systems Bioinformatics Conference.

[16]  A. Purvis A composite estimate of primate phylogeny. , 1995, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[17]  Vincent Berry,et al.  PhySIC_IST: cleaning source trees to infer more informative supertrees , 2008, BMC Bioinformatics.

[18]  M. Springer,et al.  A Critique of Matrix Representation with Parsimony Supertrees , 2004 .

[19]  W. D. de Jong,et al.  Phylogenetics. Which mammalian supertree to bark up? , 2001, Science.

[20]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[21]  Kate E. Jones,et al.  Supertrees are a necessary not-so-evil: a comment on Gatesy et al. , 2003, Systematic biology.

[22]  M. Novacek,et al.  Mammalian phylogeny: Genes and supertrees , 2001, Current Biology.

[23]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[24]  Rob DeSalle,et al.  Resolution of a supertree/supermatrix paradox. , 2002, Systematic biology.

[25]  Daniel H. Huson,et al.  Solving Large Scale Phylogenetic Problems using DCM2 , 1999, ISMB.

[26]  F. Lapointe,et al.  Total evidence, consensus, and bat phylogeny: A distance-based approach. , 1999, Molecular phylogenetics and evolution.

[27]  Satish Rao,et al.  Quartets MaxCut: A Divide and Conquer Quartets Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Alexandros Stamatakis,et al.  Taxon sampling versus computational complexity and their impact on obtaining the Tree of Life , 2007 .

[29]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[30]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[31]  Bernard M. E. Moret,et al.  Performance of Supertree Methods on Various Data Set Decompositions , 2004 .

[32]  O. Bininda-Emonds,et al.  Novel versus unsupported clades: assessing the qualitative support for clades in MRP supertrees. , 2003, Systematic biology.

[33]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[34]  E. N. Adams,et al.  N-trees as nestings: Complexity, similarity, and consensus , 1986 .

[35]  Sylvain Guillemot,et al.  PhySIC: a veto supertree method with desirable properties. , 2007, Systematic biology.