TriFLDB: A Database of Clustered Full-Length Coding Sequences from Triticeae with Applications to Comparative Grass Genomics[C][W][OA]

The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp/.

[1]  William A. Richardson,et al.  Sim4: a novel fission yeast kinetochore protein required for centromeric silencing and chromosome segregation. , 2003, The Journal of cell biology.

[2]  Akhilesh K Tyagi,et al.  Advances in cereal genomics and applications in crop breeding. , 2006, Trends in biotechnology.

[3]  Yoshihide Hayashizaki,et al.  RIKEN mouse genome encyclopedia , 2002, Mechanisms of Ageing and Development.

[4]  Pierre Sourdille,et al.  A Physical Map of the 1-Gigabase Bread Wheat Chromosome 3B , 2008, Science.

[5]  Wei Zhao,et al.  Gramene: a resource for comparative grass genomics , 2002, Nucleic Acids Res..

[6]  K. Childs Genomic and Genetic Database Resources for the Grasses[W] , 2009, Plant Physiology.

[7]  P. Langridge,et al.  The International Barley Sequencing Consortium—At the Threshold of Efficient Access to the Barley Genome1[W] , 2009, Plant Physiology.

[8]  Y. Hayashizaki,et al.  Functional screening revisited in the postgenomic era. , 2007, Molecular bioSystems.

[9]  C. Robin Buell,et al.  The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants , 2004, Nucleic Acids Res..

[10]  Mihaela M. Martis,et al.  The Sorghum bicolor genome and the diversification of grasses , 2009, Nature.

[11]  Thomas Girke,et al.  Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice1 , 2005, Plant Physiology.

[12]  Wei Zhao,et al.  Gramene: a bird's eye view of cereal genomes , 2005, Nucleic Acids Res..

[13]  Patrick Schweizer,et al.  Large-scale analysis of the barley transcriptome based on expressed sequence tags. , 2004, The Plant journal : for cell and molecular biology.

[14]  Qunfeng Dong,et al.  Comparative EST analyses in plant systems. , 2005, Methods in enzymology.

[15]  Yoshihide Hayashizaki,et al.  CDS annotation in full-length cDNA sequence. , 2003, Genome research.

[16]  Michael Freeling,et al.  Grains of knowledge: genomics of model cereals. , 2005, Genome research.

[17]  Teruyoshi Hishiki,et al.  The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts , 2007, Nucleic Acids Res..

[18]  Y. Kohara,et al.  Tissue expression map of a large number of expressed sequence tags and its application to in silico screening of stress response genes in common wheat , 2006, Molecular Genetics and Genomics.

[19]  B. Gill,et al.  Micro-colinearity between rice, Brachypodium, and Triticum monococcum at the wheat domestication locus Q , 2008, Functional & Integrative Genomics.

[20]  A. Paterson Genomics of Sorghum , 2008, International journal of plant genomics.

[21]  Yoshihiro Kawahara,et al.  The Rice Annotation Project Database (RAP-DB): 2008 update , 2007, Nucleic Acids Res..

[22]  Matthew D. Wilkerson,et al.  PlantGDB: a resource for comparative plant genomics , 2007, Nucleic Acids Res..

[23]  S. Salzberg,et al.  An optimized protocol for analysis of EST sequences. , 2000, Nucleic acids research.

[24]  Kazuo Shinozaki,et al.  Development of 5006 Full-Length CDNAs in Barley: A Tool for Accessing Cereal Genomics Resources , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[25]  J. Bouck,et al.  Insights into corn genes derived from large-scale cDNA sequencing , 2008, Plant Molecular Biology.

[26]  Li Yang,et al.  MIPSPlantsDB—plant database resource for integrative and comparative plant genome research , 2007, Nucleic Acids Res..

[27]  K. Shinozaki,et al.  Functional genomics using RIKEN Arabidopsis thaliana full-length cDNAs , 2009, Journal of Plant Research.

[28]  Maureen J Donlin,et al.  Using the Generic Genome Browser (GBrowse) , 2007, Current protocols in bioinformatics.

[29]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[30]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[31]  Tetsuya Sakurai,et al.  RARGE: a large-scale database of RIKEN Arabidopsis resources ranging from transcriptome to phenome , 2004, Nucleic Acids Res..

[32]  John Quackenbush,et al.  Using the DFCI Gene Index Databases for Biological Discovery , 2010, Current protocols in bioinformatics.

[33]  Christopher J. Rawlings,et al.  Wheat Estimated Transcript Server (WhETS): a tool to provide best estimate of hexaploid wheat transcript sequence , 2007, Nucleic Acids Res..

[34]  Wei Zhu,et al.  The TIGR Plant Transcript Assemblies database , 2006, Nucleic Acids Res..

[35]  C. Nourse,et al.  Identification of MAL2, a novel member of the mal proteolipid family, though interactions with TPD52-like proteins in the yeast two-hybrid system. , 2001, Genomics.

[36]  Hans H. Cheng,et al.  Functional genomics of the chicken--a model organism. , 2007, Poultry science.

[37]  Kanako O. Koyanagi,et al.  Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. , 2007, Genome research.

[38]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[39]  John Quackenbush,et al.  Using the TIGR Gene Index Databases for Biological Discovery , 2003, Current protocols in bioinformatics.

[40]  T. Wicker,et al.  Comparison of orthologous loci from small grass genomes Brachypodium and rice: implications for wheat genomics and grass genome annotation. , 2007, The Plant journal : for cell and molecular biology.

[41]  Lincoln Stein,et al.  Gramene: a growing plant comparative genomics resource , 2007, Nucleic Acids Res..

[42]  Kai F. Müller,et al.  PlantTribes: a gene and gene family resource for comparative genomics in plants , 2007, Nucleic Acids Res..

[43]  Christophe Périn,et al.  GreenPhylDB: a database for plant comparative genomics , 2007, Nucleic Acids Res..

[44]  Wei Zhu,et al.  Improvement of whole-genome annotation of cereals through comparative analyses. , 2007, Genome research.

[45]  李佩芳 International Rice Genome Sequencing Project. 2005. The map-based sequence of the rice genome. , 2005 .

[46]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[47]  Y. Hayashizaki,et al.  Amino acid translation program for full-length cDNA sequences with frameshift errors. , 2001, Physiological genomics.

[48]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[49]  Sarah Barber,et al.  A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis) , 2008, BMC Genomics.

[50]  O. Anderson,et al.  GrainGenes 2.0. An Improved Resource for the Small-Grains Community1 , 2005, Plant Physiology.

[51]  S. Tabata,et al.  Lotus japonicus as a platform for legume research. , 2006, Current opinion in plant biology.

[52]  T. Sakurai,et al.  TriMEDB: A database to integrate transcribed markers and facilitate genetic studies of the tribe Triticeae , 2008, BMC Plant Biology.

[53]  Kanako O. Koyanagi,et al.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones , 2004, PLoS Biology.

[54]  B. Larkins,et al.  Characterization of the maize endosperm transcriptome and its comparison to the rice genome. , 2004, Genome research.

[55]  S. Tabata,et al.  Genome Sequencing and Genome Resources in Model Legumes , 2007, Plant Physiology.

[56]  T. Gojobori,et al.  The bioinformatics challenges in comparative analysis of cereal genomes—an overview , 2004, Functional & Integrative Genomics.

[57]  C. Bult,et al.  Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs , 2006, PLoS genetics.

[58]  M. Wang,et al.  Annotation and expression profile analysis of 2073 full-length cDNAs from stress-induced maize (Zea mays L.) seedlings. , 2006, The Plant journal : for cell and molecular biology.

[59]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[60]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.