PlantTribes: a gene and gene family resource for comparative genomics in plants

The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study.

[1]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[2]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[3]  S. Dongen A cluster algorithm for graphs , 2000 .

[4]  D. Soltis,et al.  Towards a comprehensive integration of morphological and genetic studies of floral development. , 2004, Trends in plant science.

[5]  D. Soltis,et al.  Widespread genome duplications throughout the history of flowering plants. , 2006, Genome research.

[6]  D. Cosgrove,et al.  Digital Object Identifier (DOI) 10.1007/s10265-005-0253-z JPR SYMPOSIUM , 2022 .

[7]  L. Stein,et al.  The Plant Structure Ontology, a Unified Vocabulary of Anatomy and Morphology of a Flowering Plant1[W][OA] , 2006, Plant Physiology.

[8]  Nick James,et al.  NASCArrays: a repository for microarray data generated by NASC's transcriptomics service , 2004, Nucleic Acids Res..

[9]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[10]  K. Müller,et al.  PRAP-computation of Bremer support for large data sets. , 2004, Molecular phylogenetics and evolution.

[11]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[12]  Wei Zhu,et al.  The TIGR Plant Transcript Assemblies database , 2006, Nucleic Acids Res..

[13]  S. Tanksley,et al.  EST database for early flower development in California poppy (Eschscholzia californica Cham., Papaveraceae) tags over 6000 genes from a basal eudicot , 2006, Plant Molecular Biology.

[14]  Qunfeng Dong,et al.  PlantGDB, plant genome database and analysis tools , 2004, Nucleic Acids Res..

[15]  D. Soltis,et al.  Phylogeny and domain evolution in the APETALA2-like gene family. , 2006, Molecular biology and evolution.

[16]  D. Soltis,et al.  Utility of Amborella trichopoda and Nuphar advena expressed sequence tags for comparative sequence analysis , 2008 .

[17]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[18]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[19]  E. Álvarez-Buylla,et al.  Adaptive evolution in the Arabidopsis MADS-box gene family inferred from its complete resolved phylogeny , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  K. Müller,et al.  treegraph: automated drawing of complex tree figures using an extensible tree description format , 2004 .

[21]  Naomi S. Altman,et al.  The floral genome: an evolutionary history of gene duplication and shifting patterns of gene expression. , 2007, Trends in plant science.

[22]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[23]  K. Nixon,et al.  The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999, Cladistics : the international journal of the Willi Hennig Society.

[24]  References , 1971 .

[25]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[26]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[27]  Stephen Rudd,et al.  openSputnik—a database to ESTablish comparative plant genomics using unsaturated sequence collections , 2004, Nucleic Acids Res..

[28]  Jim Leebens-Mack,et al.  Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. , 2006, Molecular biology and evolution.

[29]  D. Soltis,et al.  Missing links: the genetic architecture of flowers [correction of flower] and floral diversification. , 2002, Trends in plant science.

[30]  Yi Hu,et al.  Floral gene resources from basal angiosperms for comparative genomics research , 2005, BMC Plant Biology.

[31]  Benjamin A. Shoemaker,et al.  CDD: a database of conserved domain alignments with links to domain three-dimensional structure , 2002, Nucleic Acids Res..

[32]  L. Stein,et al.  Whole-Plant Growth Stage Ontology for Angiosperms and Its Application in Plant Biology1[OA] , 2006, Plant Physiology.

[33]  Jian Gong,et al.  BarleyBase—an expression profiling database for plant genomics , 2004, Nucleic Acids Res..

[34]  R. Breaker,et al.  Engineered allosteric ribozymes that respond to specific divalent metal ions , 2005, Nucleic acids research.

[35]  Todd J. Vision,et al.  Phytome: a platform for plant comparative genomics , 2005, Nucleic Acids Res..

[36]  Hong Ma,et al.  Missing links: the genetic architecture of flower and floral diversification , 2002 .