Gene Organization in Rice Revealed by Full-Length cDNA Mapping and Gene Expression Analysis through Microarray

Rice (Oryza sativa L.) is a model organism for the functional genomics of monocotyledonous plants since the genome size is considerably smaller than those of other monocotyledonous plants. Although highly accurate genome sequences of indica and japonica rice are available, additional resources such as full-length complementary DNA (FL-cDNA) sequences are also indispensable for comprehensive analyses of gene structure and function. We cross-referenced 28.5K individual loci in the rice genome defined by mapping of 578K FL-cDNA clones with the 56K loci predicted in the TIGR genome assembly. Based on the annotation status and the presence of corresponding cDNA clones, genes were classified into 23K annotated expressed (AE) genes, 33K annotated non-expressed (ANE) genes, and 5.5K non-annotated expressed (NAE) genes. We developed a 60mer oligo-array for analysis of gene expression from each locus. Analysis of gene structures and expression levels revealed that the general features of gene structure and expression of NAE and ANE genes were considerably different from those of AE genes. The results also suggested that the cloning efficiency of rice FL-cDNA is associated with the transcription activity of the corresponding genetic locus, although other factors may also have an effect. Comparison of the coverage of FL-cDNA among gene families suggested that FL-cDNA from genes encoding rice- or eukaryote-specific domains, and those involved in regulatory functions were difficult to produce in bacterial cells. Collectively, these results indicate that rice genes can be divided into distinct groups based on transcription activity and gene structure, and that the coverage bias of FL-cDNA clones exists due to the incompatibility of certain eukaryotic genes in bacteria.

[1]  K. Waki,et al.  A Comprehensive Rice Transcript Map Containing 6591 Expressed Sequence Tag Sites , 2002, The Plant Cell Online.

[2]  M. Pellegrini,et al.  Genome-wide High-Resolution Mapping and Functional Analysis of DNA Methylation in Arabidopsis , 2006, Cell.

[3]  Jian Wang,et al.  Gene Identification and Expression Analysis of 86,136 Expressed Sequence Tags (EST) from the Rice Genome , 2003, Genomics, proteomics & bioinformatics.

[4]  Dawei Li,et al.  The Genomes of Oryza sativa: A History of Duplications , 2005, PLoS biology.

[5]  T. Liesegang The human transcriptome map: Clustering of highly expressed genes in chromosomal domains. Caron H, ∗ van Schaik B, van der Mee M, et al. Science 2001;291:1289–1292. , 2001 .

[6]  Sumio Sugano,et al.  5′-end SAGE for the analysis of transcriptional start sites , 2004, Nature Biotechnology.

[7]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica) , 2002, Science.

[8]  A. Oliphant,et al.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). , 2002, Science.

[9]  Yasuyuki Fujii,et al.  The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information , 2005, Nucleic Acids Res..

[10]  Martin J. Lercher,et al.  Clustering of housekeeping genes provides a unified model of gene order in the human genome , 2002, Nature Genetics.

[11]  F. Costa,et al.  Non-coding RNAs: lost in translation? , 2007, Gene.

[12]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[13]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[14]  Jiming Jiang,et al.  Rice as a model for centromere and heterochromatin research , 2007, Chromosome Research.

[15]  李佩芳 International Rice Genome Sequencing Project. 2005. The map-based sequence of the rice genome. , 2005 .

[16]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[17]  F. Skoog,et al.  A revised medium for rapid growth and bio assays with tobacco tissue cultures , 1962 .

[18]  Kanako O. Koyanagi,et al.  Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. , 2007, Genome research.

[19]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[20]  Kan Nobuta,et al.  Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA , 2005, Nucleic Acids Res..

[21]  Cheng Lu,et al.  Genomic and Genetic Characterization of Rice Cen3 Reveals Extensive Transcription and Evolutionary Implications of a Complex Centromere[W][OA] , 2006, The Plant Cell Online.

[22]  X. Gu,et al.  Intron gain and loss in segmentally duplicated genes in rice , 2006, Genome Biology.

[23]  Jungwon Yoon,et al.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community , 2003, Nucleic Acids Res..

[24]  Ingo Dreyer,et al.  PlnTFDB: an integrative plant transcription factor database , 2007, BMC Bioinformatics.

[25]  Wei Zhao,et al.  Gramene: a bird's eye view of cereal genomes , 2005, Nucleic Acids Res..

[26]  Jian Wang,et al.  BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics , 2004, Nucleic Acids Res..

[27]  J. Nap,et al.  In plants, highly expressed genes are the least compact. , 2006, Trends in genetics : TIG.

[28]  Jun Wang,et al.  Genome-wide transcription analyses in rice using tiling microarrays , 2006, Nature Genetics.

[29]  K. Akiyama,et al.  Functional Annotation of a Full-Length Arabidopsis cDNA Collection , 2002, Science.

[30]  Kanako O. Koyanagi,et al.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones , 2004, PLoS Biology.

[31]  Qifa Zhang,et al.  Features of the expressed sequences revealed by a large-scale analysis of ESTs from a normalized cDNA library of the elite indica rice cultivar Minghui 63. , 2005, The Plant journal : for cell and molecular biology.

[32]  H. Kanamori,et al.  Identification and mapping of expressed genes, simple sequence repeats and transposable elements in centromeric regions of rice chromosomes. , 2006, DNA research : an international journal for rapid publication of reports on genes and genomes.

[33]  Joseph M. Dale,et al.  Empirical Analysis of Transcriptional Activity in the Arabidopsis Genome , 2003, Science.

[34]  I. E. Johansen Intron insertion facilitates amplification of cloned virus cDNA in Escherichia coli while biological activity is reestablished after transcription in vivo. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[35]  C. Bult,et al.  Functional annotation of a full-length mouse cDNA collection , 2001, Nature.

[36]  Hideaki Sugawara,et al.  DDBJ working on evaluation and classification of bacterial genes in INSDC , 2006, Nucleic Acids Res..

[37]  S. Henikoff,et al.  Sequencing of a rice centromere uncovers active genes , 2004, Nature Genetics.

[38]  C. Walsh,et al.  Hsp90 chaperonins possess ATPase activity and bind heat shock transcription factors and peptidyl prolyl isomerases. , 1993, The Journal of biological chemistry.

[39]  D. Stenger,et al.  Fully biologically active in vitro transcripts of the eriophyid mite-transmitted wheat streak mosaic tritimovirus. , 1999, Phytopathology.

[40]  H. Kanamori,et al.  Sequencing and characterization of telomere and subtelomere regions on rice chromosomes 1S, 2S, 2L, 6L, 7S, 7L and 8S. , 2006, The Plant journal : for cell and molecular biology.

[41]  F. Baas,et al.  The Human Transcriptome Map: Clustering of Highly Expressed Genes in Chromosomal Domains , 2001, Science.

[42]  John A. Hamilton,et al.  The TIGR Rice Genome Annotation Resource: improvements and new features , 2006, Nucleic Acids Res..

[43]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[44]  D. Baulcombe,et al.  Infectious in vivo and in vitro transcripts from a full-length cDNA clone of PVY-N605, a Swiss necrotic isolate of potato virus Y. , 1997, The Journal of general virology.

[45]  S. Madden,et al.  Global transcript analysis of rice leaf and seed using SAGE technology. , 2003, Plant biotechnology journal.

[46]  J. Mattick,et al.  Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. , 2005, Genome research.

[47]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.