Improvement of whole-genome annotation of cereals through comparative analyses.

Rice is an important model species for the Poaceae and other monocotyledonous plants. With the availability of a near-complete, finished, and annotated rice genome, we performed genome level comparisons between rice and all plant species in which large genomic or transcriptomic data sets are available to determine the utility of cross-species sequence for structural and functional annotation of the rice genome. Through comparative analyses with four plant genome sequence data sets and transcript assemblies from 185 plant species, we were able to confirm and improve the structural annotation of the rice genome. Support for 38,109 (89.3%) of the total 42,653 nontransposable element-related genes in the rice genome in the form of a rice expressed sequence tag, full-length cDNA, or plant homolog from our comparative analyses could be found. Although the majority of the putative homologs were obtained from Poaceae species, putative homologs were identified in dicotyledonous angiosperms, gymnosperms, and other plants such as algae, moss, and fern. A set of rice genes (7669) lacking a putative homolog was identified which may be lineage-specific genes that evolved after speciation and have a role in species diversity. Improvements to the current rice gene structural annotation could be identified from our comparative alignments and we were able to identify 487 genes which were mostly likely missed in the current rice genome annotation and another 500 genes for structural annotation review. We were able to demonstrate the utility of cross-species comparative alignments in the identification of noncoding sequences and in confirmation of gene nesting in rice.

[1]  Dawei Li,et al.  The Genomes of Oryza sativa: A History of Duplications , 2005, PLoS biology.

[2]  Yasuyuki Fujii,et al.  The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information , 2005, Nucleic Acids Res..

[3]  Thomas Schiex,et al.  EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences , 2003, Nucleic Acids Res..

[4]  Wei Zhu,et al.  Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus , 2004, Bioinform..

[5]  D. Bartel,et al.  MicroRNAS and their regulatory roles in plants. , 2006, Annual review of plant biology.

[6]  C. Robin Buell,et al.  The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants , 2004, Nucleic Acids Res..

[7]  Takashi Matsumoto,et al.  Rice Annotation Database (RAD): a contig-oriented database for map-based rice genomics , 2004, Nucleic Acids Res..

[8]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[9]  L. Xiong,et al.  Isolation and annotation of 10828 putative full length cDNAs from indica rice , 2005, Science in China Series C: Life Sciences.

[10]  Daniel G Peterson,et al.  Integration of Cot analysis, DNA cloning, and high-throughput sequencing facilitates genome characterization and gene discovery. , 2002, Genome research.

[11]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[12]  J Quackenbush,et al.  Enrichment of Gene-Coding Sequences in Maize by Genome Filtration , 2003, Science.

[13]  Junhua Peng,et al.  Comparative DNA sequence analysis of wheat and rice genomes. , 2003, Genome research.

[14]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[15]  Jingchu Luo,et al.  Duplication and DNA segmental loss in the rice genome: implications for diploidization. , 2005, The New phytologist.

[16]  Wei Zhu,et al.  The Institute for Genomic Research Osa1 Rice Genome Annotation Database1 , 2005, Plant Physiology.

[17]  V. Brendel,et al.  Genomewide comparative analysis of alternative splicing in plants. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[18]  X. Gu,et al.  Intron gain and loss in segmentally duplicated genes in rice , 2006, Genome Biology.

[19]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[20]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[21]  Eugene Berezikov,et al.  Approaches to microRNA discovery , 2006, Nature Genetics.

[22]  Robert A. Martienssen,et al.  Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome , 1999, Nature Genetics.

[23]  Yi Zhao,et al.  NONCODE: an integrated knowledge database of non-coding RNAs , 2004, Nucleic Acids Res..

[24]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[25]  Yinan Yuan,et al.  High-Cot sequence analysis of the maize genome. , 2003, The Plant journal : for cell and molecular biology.

[26]  Jian Wang,et al.  BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics , 2004, Nucleic Acids Res..

[27]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[28]  Douglas R Hoen,et al.  The evolutionary fate of MULE-mediated duplications of host gene fragments in rice. , 2005, Genome research.

[29]  W. Richard McCombie,et al.  Sorghum Genome Sequencing by Methylation Filtration , 2005, PLoS biology.

[30]  Wei Zhu,et al.  Optimal spliced alignment of homologous cDNA to a genomic DNA template , 2000, Bioinform..

[31]  J. Bennetzen Comparative Sequence Analysis of Plant Nuclear Genomes: Microcolinearity and Its Many Exceptions , 2000, Plant Cell.

[32]  Doreen Ware,et al.  Comparison of genes among cereals. , 2003, Current opinion in plant biology.

[33]  A. Adai,et al.  Computational prediction of miRNAs in Arabidopsis thaliana. , 2005, Genome research.

[34]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[35]  R. Sunkar,et al.  Novel and Stress-Regulated MicroRNAs and Other Small RNAs from Arabidopsis , 2004, The Plant Cell Online.

[36]  J. Bennetzen,et al.  The maize genome as a model for efficient sequence analysis of large plant genomes. , 2006, Current opinion in plant biology.

[37]  Fei Li,et al.  MicroRNA identification based on sequence and structure alignment , 2005, Bioinform..

[38]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[39]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[40]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[41]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.

[42]  Takashi Matsumoto,et al.  RiceGAAS: an automated annotation system and database for rice genome sequence , 2002, Nucleic Acids Res..

[43]  Baohong Zhang,et al.  Conservation and divergence of plant microRNA genes. , 2006, The Plant journal : for cell and molecular biology.

[44]  R. Shamir,et al.  How prevalent is functional alternative splicing in the human genome? , 2004, Trends in genetics : TIG.

[45]  Sean R. Eddy,et al.  Pack-MULE transposable elements mediate gene evolution in plants , 2004, Nature.

[46]  Rod A Wing,et al.  Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species. , 2005, Genome research.

[47]  K. Devos,et al.  Comparative genetics in the grasses. , 1998, Plant molecular biology.

[48]  Wei Zhu,et al.  The TIGR Plant Transcript Assemblies database , 2006, Nucleic Acids Res..

[49]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[50]  Michael R. Brent,et al.  Using Multiple Alignments to Improve Gene Prediction , 2005, RECOMB.

[51]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[52]  A. Paterson,et al.  Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[53]  A. Omer,et al.  Small non-coding RNAs in Archaea. , 2005, Current opinion in microbiology.

[54]  S. Dike,et al.  Maize Genome Sequencing by Methylation Filtration , 2003, Science.

[55]  Li Zheng,et al.  The TIGR Maize Database , 2005, Nucleic Acids Res..