Assessment of the Accuracy of Matrix Representation with Parsimony Analysis Supertree Construction

—Despite the growing popularity of supertree construction for combining phylogenetic information to produce more inclusive phylogenies, large-scale performance testing of this method has not been done. Through simulation, we tested the accuracy of the most widely used supertree method, matrix representation with parsimony analysis (MRP), with respect to a (maximum parsimony) total evidence solution and a known model tree. When source trees overlap completely, MRP provided a reasonable approximation of the total evidence tree; agreement was usually >85%. Performance improved slightly when using smaller, more numerous, or more congruent source trees, and especially when elements were weighted in proportion to the bootstrap frequencies of the nodes they represented on each source tree (“weighted MRP”). Although total evidence always estimated the model tree slightly better than nonweighted MRP methods, weighted MRP in turn usually outperformed total evidence slightly. When source studies were even moderately nonoverlapping (i.e., sharing only three-quarters of the taxa), the high proportion of missing data caused a loss in resolution that severely degraded the performance for all methods, including total evidence. In such cases, even combining more trees, which had positive effects elsewhere, did not improve accuracy. Instead, “seeding” the supertree or total evidence analyses with a single largely complete study improved performance substantially. This Žnding could be an important strategy for any studies that seek to combine phylogenetic information. Overall, our results suggest that MRP supertree construction provides a reasonable approximation of a total evidence solution and that weighted MRP should be used whenever possible. [Accuracy; matrix representation; missing data; MRP; phylogenetic supertrees; resolution; taxonomic congruence; total evidence.] Supertree construction (sensu Sanderson et al., 1998) represents an increasingly popular technique for combining phylogenetic information. Large-scale supertrees already exist for all extant species of the mammalian orders Primates (Purvis, 1995a; Purvis and Webster, 1999) and Carnivora (BinindaEmonds et al., 1999), for the major clades within the legume subfamily Papilionoideae (Wojciechowski et al., 2000), and for the family-level relationships of all extant mammals (Liu et al., 2001). Furthermore, supertree construction has been identiŽed as the key to producing comprehensive phylogenies for problematic clades (e.g., the kinetoplastid protozoaTrypanosomatidae; Stothard, 2000). The appeal of supertrees lies in their ability to synthesize many smaller, disparate sources of phylogenetic information into a single more-encompassing, but still wellresolved tree. This is especially true of one supertree method, matrix representation using parsimony (MRP; Baum, 1992; Ragan, 1992; also Brooks, 1981; Doyle, 1992). 1Current address and address for correspondence: Institute of Evolutionary and Ecological Sciences, Leiden University, Kaiserstraat 63, 9516, 2300 RA Leiden, The Netherlands; E-mail: bininda@rulsfb.leidenuniv.nl In many cases, comprehensive phylogenetic estimates of an entire group cannot otherwise be obtained by conventional phylogenetic methods. For instance, primary analysis or total evidence (sensu Kluge, 1989) requires the combined data to be compatible, whereas taxonomic congruence (sensu Mickevich, 1978) requires that the studies possess the same set of taxa. Supertrees combine the positive aspects of both of the latter two approaches to avoid their individual shortcomings. Like taxonomic congruence, supertree construction utilizes tree topologies and thus allows phylogenetic estimates derived from all possible data sources (which are often incompatible) to be combined—usually retaining good resolution while doing so (Purvis, 1995b). Like total evidence, supertree construction can combine estimates with different sets of terminal taxa to obtain a solution that contains statements of phylogenetic relationship that are not present in any single source study. Overall, supertree construction seems to show great promise for phylogenetic inference and the ultimate goal of estimating the tree of life on the basis of using all available information.

[1]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[2]  A. Purvis A composite estimate of primate phylogeny. , 1995, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[3]  Michael M. Miyamoto,et al.  Molecular and Morphological Supertrees for Eutherian (Placental) Mammals , 2001, Science.

[4]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[5]  A. Rodrigo,et al.  Likelihood-based tests of topologies in phylogenetics. , 2000, Systematic biology.

[6]  Allen G. Rodrigo,et al.  A comment on Baum's method for combining phylogenetic trees , 1993 .

[7]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[8]  Michael J. Sanderson,et al.  MOLECULAR PHYLOGENY OF THE "TEMPERATE HERBACEOUS TRIBES" OF PAPILIONOID LEGUMES: A SUPERTREE APPROACH , 2000 .

[9]  E. -,et al.  Properties of Matrix Representation with Parsimony Analyses , 2000 .

[10]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[11]  Andy Purvis,et al.  Phylogenetic supertrees: Assembling the trees of life. , 1998, Trends in ecology & evolution.

[12]  J. Doyle,et al.  Gene Trees and Species Trees: Molecular Systematics as One-Character Taxonomy , 1992 .

[13]  I. Kitching Cladistics: The Theory and Practice of Parsimony Analysis , 1998 .

[14]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[15]  D. Swofford When are phylogeny estimates from molecular and morphological data incongruent , 1991 .

[16]  W. Maddison RECONSTRUCTING CHARACTER EVOLUTION ON POLYTOMOUS CLADOGRAMS , 1989, Cladistics : the international journal of the Willi Hennig Society.

[17]  Andy Purvis,et al.  A Modification to Baum and Ragan's Method for Combining Phylogenetic Trees , 1995 .

[18]  B. Baum Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees , 1992 .

[19]  J. Ohn,et al.  Does Adding Characters with Missing Data Increase or Decrease Phylogenetic Accuracy ? , 2003 .

[20]  J. L. Gittleman,et al.  Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia) , 1999, Biological reviews of the Cambridge Philosophical Society.

[21]  D. H. Colless,et al.  Predictivity and Stability in Classifications: some Comments on Recent Studies , 1981 .

[22]  N. Platnick,et al.  ON MISSING ENTRIES IN CLADISTIC ANALYSIS , 1991 .

[23]  M. Ragan,et al.  Reply to A. G. Rodrigo's "A Comment on Baum's Method for Combining Phylogenetic Trees" , 1993 .

[24]  A. Kluge A Concern for Evidence and a Phylogenetic Hypothesis of Relationships among Epicrates (Boidae, Serpentes) , 1989 .

[25]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[26]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[27]  Daniel R. Brooks,et al.  Hennig's Parasitological Method: A Proposed Solution , 1981 .

[28]  F. Ronquist Matrix representation of trees, redundancy, and weighting , 1996 .

[29]  Donald H. Colless,et al.  Congruence Between Morphometric and Allozyme Data for Menidia Species: A Reappraisal , 1980 .

[30]  A. Purvis,et al.  Comparative Primate Socioecology: Phylogenetically independent comparisons and primate phylogeny , 1999 .

[31]  Future trypanosomatid phylogenies: refined homologies, supertrees and networks. , 2000, Memorias do Instituto Oswaldo Cruz.

[32]  Mark Wilkinson,et al.  Coping with Abundant Missing Entries in Phylogenetic Inference Using Parsimony , 1995 .

[33]  Henri Poincaré,et al.  Second Complément à l'Analysis Situs , 1900 .

[34]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[35]  A. Rodrigo On combining cladograms , 1996 .