Expressed Sequence Tags With cDNA Termini: Previously Overlooked Resources for Gene Annotation and Transcriptome Exploration in Chlamydomonas reinhardtii

Many of Chlamydomonas reinhardtii expressed sequence tags (ESTs) in GenBank dbEST and community EST assemblies were either over- or undertrimmed in terms of their cDNA termini, which are defined as the diagnostic sequence elements that delineate 3′/5′ ends of mRNA transcripts. Overtrimming represents a loss of directional, positional, and structural information of transcript ends whereas undertrimming causes unclean spurious sequences retained in ESTs that exert deleterious impacts on downstream EST-based applications. We examined 309,278 raw EST sequencing trace files of C. reinhardtii and found that only 57% had cDNA termini that matched the expected structures specified in their cDNA library constructions while satisfying our minimum length requirement for their final clean sequences. Using GMAP, 156,963 individual ESTs were mapped to the genome successfully, with their in silico-verified cDNA termini anchored to the genome. Our data analysis suggested strong macro- and microheterogeneity of 3′/5′ end positions of individual transcripts derived from the same genes in C. reinhardtii. This work annotating differential ends of individual transcripts in the draft genome presents the research community with a new stream of data that will facilitate accurate determination of gene structures, genome annotation, and exploration of the transcriptome and mRNA metabolism in C. reinhardtii.

[1]  Jeff Shrager,et al.  Chlamydomonas reinhardtii at the Crossroads of Genomics , 2003, Eukaryotic Cell.

[2]  Jeff Shrager,et al.  EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome , 2007, Nucleic acids research.

[3]  Yi-An Chen,et al.  An optimized procedure greatly improves EST vector contamination removal , 2007, BMC Genomics.

[4]  Y. Nakamura,et al.  A large scale structural analysis of cDNAs in a unicellular green alga, Chlamydomonas reinhardtii. I. Generation of 3433 non-redundant expressed sequence tags. , 1999, DNA research : an international journal for rapid publication of reports on genes and genomes.

[5]  Sara L. Zimmer,et al.  The Chlamydomonas Genome Reveals the Evolution of Key Animal and Plant Functions , 2007, Science.

[6]  Bin Tian,et al.  Alternative polyadenylation of cyclooxygenase-2 , 2005, Nucleic acids research.

[7]  Hui-Hsien Chou,et al.  DNA sequence quality trimming and vector removal , 2001, Bioinform..

[8]  Haiming Wang,et al.  MAGIC-SPP: a database-driven DNA sequence processing package with associated management tools , 2006, BMC Bioinformatics.

[9]  H. Fukuzawa,et al.  Establishment of publicly available cDNA material and information resource of Chlamydomonas reinhardtii (Chlorophyta) to facilitate gene function analysis , 2004 .

[10]  Jing Zhao,et al.  Formation of mRNA 3′ Ends in Eukaryotes: Mechanism, Regulation, and Interrelationships with Other Steps in mRNA Synthesis , 1999, Microbiology and Molecular Biology Reviews.

[11]  Thomas L. Madden,et al.  BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. , 1999, FEMS microbiology letters.

[12]  Gang Wang,et al.  WebTraceMiner: a web service for processing and mining EST sequence trace files , 2007, Nucleic Acids Res..

[13]  Yongfeng Jin,et al.  Nontemplated nucleotide addition prior to polyadenylation: a comparison of Arabidopsis cDNA and genomic sequences. , 2004, RNA.

[14]  C R Cantor,et al.  In silico detection of control signals: mRNA 3'-end-processing sequences in diverse species. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Carol Harger,et al.  Establishing a method of vector contamination identification in database sequences , 1999, Bioinform..

[16]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[17]  Gang Wang,et al.  ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs) , 2007, BMC Genomics.

[18]  Chun Liang,et al.  Unique Features of Nuclear mRNA Poly(A) Signals and Alternative Polyadenylation in Chlamydomonas reinhardtii , 2008, Genetics.

[19]  T. Smith,et al.  Corruption of genomic databases with anomalous sequence. , 1992, Nucleic acids research.

[20]  Anna V. Vlasova,et al.  preAssemble: a tool for automatic sequencer trace data processing , 2005, BMC Bioinformatics.

[21]  R Staden,et al.  The staden sequence analysis package , 1996, Molecular biotechnology.

[22]  Bin Ma,et al.  Patternhunter Ii: Highly Sensitive and Fast Homology Search , 2004, J. Bioinform. Comput. Biol..

[23]  Robin B. Gasser,et al.  A hitchhiker's guide to expressed sequence tag (EST) analysis , 2006, Briefings Bioinform..

[24]  F. Chen,et al.  Robust analysis of 5 0 -transcript ends (5 0 -RATE): a novel technique for transcriptome analysis and genome annotation , 2006 .

[25]  Qingshun Quinn Li,et al.  Compilation of mRNA Polyadenylation Signals in Arabidopsis Revealed a New Signal Element and Potential Secondary Structures1[w] , 2005, Plant Physiology.

[26]  Jeff Shrager,et al.  Chlamydomonas reinhardtii Genome Project. A Guide to the Generation and Use of the cDNA Information1 , 2003, Plant Physiology.

[27]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[28]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[29]  Y. Nakamura,et al.  Generation of expressed sequence tags from low-CO2 and high-CO2 adapted cells of Chlamydomonas reinhardtii. , 2000, DNA research : an international journal for rapid publication of reports on genes and genomes.

[30]  E. Liu,et al.  5' Long serial analysis of gene expression (LongSAGE) and 3' LongSAGE for transcriptome characterization and genome annotation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.