ESTimating plant phylogeny: lessons from partitioning

BackgroundWhile Expressed Sequence Tags (ESTs) have proven a viable and efficient way to sample genomes, particularly those for which whole-genome sequencing is impractical, phylogenetic analysis using ESTs remains difficult. Sequencing errors and orthology determination are the major problems when using ESTs as a source of characters for systematics. Here we develop methods to incorporate EST sequence information in a simultaneous analysis framework to address controversial phylogenetic questions regarding the relationships among the major groups of seed plants. We use an automated, phylogenetically derived approach to orthology determination called OrthologID generate a phylogeny based on 43 process partitions, many of which are derived from ESTs, and examine several measures of support to assess the utility of EST data for phylogenies.ResultsA maximum parsimony (MP) analysis resulted in a single tree with relatively high support at all nodes in the tree despite rampant conflict among trees generated from the separate analysis of individual partitions. In a comparison of broader-scale groupings based on cellular compartment (ie: chloroplast, mitochondrial or nuclear) or function, only the nuclear partition tree (based largely on EST data) was found to be topologically identical to the tree based on the simultaneous analysis of all data. Despite topological conflict among the broader-scale groupings examined, only the tree based on morphological data showed statistically significant differences.ConclusionBased on the amount of character support contributed by EST data which make up a majority of the nuclear data set, and the lack of conflict of the nuclear data set with the simultaneous analysis tree, we conclude that the inclusion of EST data does provide a viable and efficient approach to address phylogenetic questions within a parsimony framework on a genomic scale, if problems of orthology determination and potential sequencing errors can be overcome. In addition, approaches that examine conflict and support in a simultaneous analysis framework allow for a more precise understanding of the evolutionary history of individual process partitions and may be a novel way to understand functional aspects of different kinds of cellular classes of gene products.

[1]  Rob DeSalle,et al.  An automated phylogenetic key for classifying homeoboxes , 2002 .

[2]  R. Sederoff,et al.  Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Michael J. Donoghue,et al.  Seed plant phylogeny and the origin of angiosperms: An experimental cladistic approach , 1986, The Botanical Review.

[4]  A. Chicaro,et al.  Animal Evolution and the Molecular Signature of Radiations Compressed in Time , 2005 .

[5]  M. Sanderson,et al.  Molecular evidence on plant divergence times. , 2004, American journal of botany.

[6]  Rob DeSalle,et al.  Combined support for wholesale taxic atavism in gavialine crocodylians. , 2003, Systematic biology.

[7]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[8]  Mark L. Blaxter,et al.  Making sense of EST sequences by CLOBBing them , 2002, BMC Bioinformatics.

[9]  R. Sederoff,et al.  Gene expression during formation of earlywood and latewood in loblolly pine: expression profiles of 350 genes. , 2004, Plant biology.

[10]  Yi Hu,et al.  Floral gene resources from basal angiosperms for comparative genomics research , 2005, BMC Plant Biology.

[11]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[12]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[13]  Rob DeSalle,et al.  Resolution of a supertree/supermatrix paradox. , 2002, Systematic biology.

[14]  Aloysius J. Phillips,et al.  Comparative Phylogenomics: A Strategy for High-throughput Large-scale Sub-genomic Sequencing Projects for Phylogenetic Analysis , 2002 .

[15]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[16]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica) , 2002, Science.

[17]  H. Philippe,et al.  Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. , 2005, Molecular biology and evolution.

[18]  Kazuo Shinozaki,et al.  Comparative genomics of Physcomitrella patens gametophytic transcriptome and Arabidopsis thaliana: Implication for land plant evolution , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  J. Palmer,et al.  Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[21]  R DeSalle,et al.  Multiple sources of character information and the phylogeny of Hawaiian drosophilids. , 1997, Systematic biology.

[22]  S. Lanyon,et al.  DETECTING INTERNAL INCONSISTENCIES IN DISTANCE DATA , 1985 .

[23]  G. Rothwell,et al.  Lignophyte phylogeny and the evolution of spermatophytes : a numerical cladistic analysis , 1994 .

[24]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[25]  Michael J. Donoghue,et al.  Seed plant phylogeny: Demise of the anthophyte hypothesis? , 2000, Current Biology.

[26]  K. Sakakibara,et al.  Characterization of a FLORICAULA/LEAFY Homologue of Gnetum parvifolium and Its Implications for the Evolution of Reproductive Organs in Seed Plants , 2001, International Journal of Plant Sciences.

[27]  J. Doyle Molecules, morphology, fossils, and the relationship of angiosperms and Gnetales. , 1998, Molecular phylogenetics and evolution.

[28]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[29]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[30]  M. Källersjö,et al.  Seed Plant Relationships and the Systematic Position of Gnetales Based on Nuclear and Chloroplast DNA: Conflicting Data, Rooting Problems, and the Monophyly of Conifers , 2002, International Journal of Plant Sciences.

[31]  Kevin C. Nixon,et al.  A Reevaluation of Seed Plant Phylogeny , 1994 .

[32]  A. Vogler,et al.  Exploring data interaction and nucleotide alignment in a multiple gene analysis of Ips (Coleoptera: Scolytinae). , 2001, Systematic biology.

[33]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[34]  H. A. Schneider-Poetsch,et al.  The Evolution of Gymnosperms Redrawn by Phytochrome Genes: The Gnetatae Appear at the Base of the Gymnosperms , 2002, Journal of Molecular Evolution.

[35]  P. Crane Time for the angiosperms , 1993, Nature.

[36]  R. Baker,et al.  Corroboration among Data Sets in Simultaneous Analysis: Hidden Support for Phylogenetic Relationships among Higher Level Artiodactyl Taxa , 1999, Cladistics : the international journal of the Willi Hennig Society.

[37]  A. Kluge A Concern for Evidence and a Phylogenetic Hypothesis of Relationships among Epicrates (Boidae, Serpentes) , 1989 .

[38]  S. Rudd Expressed sequence tags: alternative or complement to whole genome sequences? , 2003, Trends in plant science.

[39]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[40]  R. DeSalle Animal phylogenomics: multiple interspecific genome comparisons. , 2005, Methods in enzymology.

[41]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[42]  Gloria M. Coruzzi,et al.  OrthologID: automation of genome-scale ortholog identification within a parsimony framework , 2006, Bioinform..

[43]  K. Bremer,et al.  BRANCH SUPPORT AND TREE STABILITY , 1994 .

[44]  P. Arctander,et al.  Hidden morphological support for the phylogenetic placement of Pseudoryx nghetinhensis with bovine bovids: a combined analysis of gross anatomical evidence and DNA sequences from five genes. , 2000, Systematic biology.

[45]  A. E. Hirsh,et al.  Protein dispensability and rate of evolution , 2001, Nature.

[46]  J. Wiens,et al.  Missing data, incomplete taxa, and phylogenetic accuracy. , 2003, Systematic biology.

[47]  H. Mewes,et al.  How can we deliver the large plant genomes? Strategies and perspectives. , 2002, Current opinion in plant biology.

[48]  Peter R. Crane,et al.  Phylogenetic analysis of seed plants and the origin of angiosperms , 1985 .

[49]  Gonzalo Giribet,et al.  Techniques in Molecular Systematics and Evolution , 2002, Methods and Tools in Biosciences and Medicine.

[50]  J. Ohlrogge,et al.  Unraveling plant metabolism by EST analysis. , 2000, Current opinion in plant biology.

[51]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.

[52]  R. Sederoff,et al.  Analysis of xylem formation in pine by cDNA sequencing. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[53]  M. Donoghue,et al.  Integration of morphological and ribosomal RNA data on the origin of angiosperms , 1994 .

[54]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[55]  J. Farris THE RETENTION INDEX AND THE RESCALED CONSISTENCY INDEX , 1989, Cladistics : the international journal of the Willi Hennig Society.

[56]  C. Bult,et al.  TESTING SIGNIFICANCE OF INCONGRUENCE , 1994 .

[57]  Maryse Condé Tree of Life , 1992 .

[58]  D. Soltis,et al.  Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology , 1999, Nature.

[59]  Carol J. Bult,et al.  Constructing a Significance Test for Incongruence , 1995 .

[60]  James M. Carpenter,et al.  ON SIMULTANEOUS ANALYSIS , 1996 .

[61]  Pedro Beltrão,et al.  Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions , 2005, PLoS Comput. Biol..

[62]  G. Yatskievych,et al.  Pteridology in Perspective , 1997 .

[63]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[64]  R. Baker,et al.  Hidden likelihood support in genomic data: can forty-five wrongs make a right? , 2005, Systematic biology.

[65]  W. Murphy,et al.  Resolution of the Early Placental Mammal Radiation Using Bayesian Phylogenetics , 2001, Science.

[66]  H. Saedler,et al.  MADS-box genes reveal that gnetophytes are more closely related to conifers than to flowering plants. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[67]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[68]  R. Sederoff,et al.  Functional genomics and cell wall biosynthesis in loblolly pine , 2004, Plant Molecular Biology.

[69]  J. Farris,et al.  PARSIMONY JACKKNIFING OUTPERFORMS NEIGHBOR‐JOINING , 1996, Cladistics : the international journal of the Willi Hennig Society.

[70]  Pamela S Soltis,et al.  Phylogeny of seed plants based on evidence from eight genes. , 2002, American journal of botany.

[71]  K. Bremer THE LIMITS OF AMINO ACID SEQUENCE DATA IN ANGIOSPERM PHYLOGENETIC RECONSTRUCTION , 1988, Evolution; international journal of organic evolution.

[72]  E. Koonin,et al.  Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. , 2002, Genome research.

[73]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[74]  Eugene W. Myers,et al.  Basic local alignment search tool. Journal of Molecular Biology , 1990 .

[75]  P. Holland,et al.  Phylogenomics of eukaryotes: impact of missing data on large alignments. , 2004, Molecular biology and evolution.

[76]  Richard W McCombie,et al.  Expressed sequence tag analysis in Cycas, the most primitive living seed plant , 2003, Genome Biology.

[77]  R DeSalle,et al.  Alignment-ambiguous nucleotide sites and the exclusion of systematic data. , 1993, Molecular phylogenetics and evolution.

[78]  A. Oliphant,et al.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). , 2002, Science.

[79]  J. Farris,et al.  Regular ArticlePARSIMONY JACKKNIFING OUTPERFORMS NEIGHBOR-JOINING , 1996 .

[80]  C. dePamphilis,et al.  Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales' closest relatives are conifers. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[81]  D. Soltis,et al.  The phylogeny of land plants inferred from 18S rDNA sequences: pushing the limits of rDNA signal? , 1999, Molecular biology and evolution.

[82]  W. Martin,et al.  Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms. , 1996, Molecular biology and evolution.

[83]  D. Stevenson,et al.  Cladistics of the Spermatophyta , 1990, Brittonia.

[84]  J. Farris,et al.  Quantitative Phyletics and the Evolution of Anurans , 1969 .

[85]  E. M. Gifford,et al.  Morphology and evolution of vascular plants , 1989 .

[86]  N. Grishin,et al.  Genome trees and the tree of life. , 2002, Trends in genetics : TIG.

[87]  M. Sanderson,et al.  ANGIOSPERM DIVERGENCE TIMES: THE EFFECT OF GENES, CODON POSITIONS, AND TIME CONSTRAINTS , 2005, Evolution; international journal of organic evolution.

[88]  Herbert S. Rosenkranz,et al.  Information value of the rodent bioassay , 1988, Nature.

[89]  K. Nixon,et al.  The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999, Cladistics : the international journal of the Willi Hennig Society.

[90]  J. G. Burleigh,et al.  Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. , 2004, American journal of botany.

[91]  Rob DeSalle,et al.  The Widespread Colonization Island of Actinobacillus actinomycetemcomitans , 2003, Nature Genetics.

[92]  Walter N. Moss,et al.  EST analysis in Ginkgo biloba: an assessment of conserved developmental regulators and gymnosperm specific genes , 2005, BMC Genomics.

[93]  Y. van de Peer,et al.  Moss transcriptome and beyond. , 2002, Trends in plant science.