The Institute for Genomic Research Osa1 Rice Genome Annotation Database1

We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 nontransposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org.

[1]  M. Adams,et al.  A tool for analyzing and annotating genomic sequences. , 1997, Genomics.

[2]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[3]  K. Waki,et al.  A Comprehensive Rice Transcript Map Containing 6591 Expressed Sequence Tag Sites , 2002, The Plant Cell Online.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  C. Robin Buell,et al.  The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants , 2004, Nucleic Acids Res..

[6]  G. Droc,et al.  High throughput T-DNA insertion mutagenesis in rice: a first step towards in silico reverse genetics. , 2004, The Plant journal : for cell and molecular biology.

[7]  G. F. Barry The use of the Monsanto draft rice genome sequence in research. , 2001, Plant physiology.

[8]  Chul Min Kim,et al.  Rapid, large-scale generation of Ds transposant lines and analysis of the Ds insertion sites in rice. , 2004, The Plant journal : for cell and molecular biology.

[9]  E. D. Earle,et al.  Nuclear DNA content of some important plant species , 1991, Plant Molecular Biology Reporter.

[10]  S. Lin,et al.  A high-density rice genetic linkage map with 2275 markers using a single F2 population. , 1998, Genetics.

[11]  S. Salzberg,et al.  GeneSplicer: a new computational method for splice site prediction. , 2001, Nucleic acids research.

[12]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[13]  Christopher D Town,et al.  Annotation of the Arabidopsis Genome1 , 2003, Plant Physiology.

[14]  Richard M. Bruskiewich,et al.  Transposable element annotation of the rice genome , 2004, Bioinform..

[15]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[16]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[17]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[18]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[19]  S. Tanksley,et al.  Saturated molecular map of the rice genome based on an interspecific backcross population. , 1994, Genetics.

[20]  P. Ouwerkerk,et al.  Early and multiple Ac transpositions in rice suitable for efficient insertional mutagenesis , 2001, Plant Molecular Biology.

[21]  Jia Liu,et al.  The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists , 2003, Nucleic Acids Res..

[22]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica) , 2002, Science.

[23]  Miftahudin,et al.  A Chromosome Bin Map of 16,000 Expressed Sequence Tag Loci and Distribution of Genes Among the Three Genomes of Polyploid Wheat , 2004, Genetics.

[24]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[25]  Sean R. Eddy,et al.  Pack-MULE transposable elements mediate gene evolution in plants , 2004, Nature.

[26]  K. Devos,et al.  Comparative genetics in the grasses. , 1998, Plant molecular biology.

[27]  A. Miyao,et al.  Target Site Specificity of the Tos17 Retrotransposon Shows a Preference for Insertion within Genes and against Insertion in Retrotransposon-Rich Regions of the Genome Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.012559. , 2003, The Plant Cell Online.

[28]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[29]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[30]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[31]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[32]  A. Oliphant,et al.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). , 2002, Science.

[33]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[34]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[35]  Daniel Lee,et al.  The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species , 2001, Nucleic Acids Res..

[36]  B. Burr,et al.  International Rice Genome Sequencing Project: the effort to completely sequence the rice genome. , 2000, Current opinion in plant biology.