Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics

BackgroundThe Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance.ResultsTo accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%.ConclusionThe collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional genomics and molecular breeding. Full-length cDNA sequences and their annotations are provided in the database KaFTom http://www.pgb.kazusa.or.jp/kaftom/ via the website of the National Bioresource Project Tomato http://tomato.nbrp.jp.

[1]  M. Borodovsky,et al.  Gene identification in novel eukaryotic genomes by self-training algorithm , 2005, Nucleic acids research.

[2]  J. Giovannoni Genetic Regulation of Fruit Development and Ripening , 2004, The Plant Cell Online.

[3]  C. Gissi,et al.  Untranslated regions of mRNAs , 2002, Genome Biology.

[4]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[5]  Richard Mott,et al.  EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA , 1997, Comput. Appl. Biosci..

[6]  Thomas Schiex,et al.  FrameDP: sensitive peptide detection on noisy matured sequences , 2009, Bioinform..

[7]  Carolyn J. Lawrence-Dill,et al.  Comparative Plant Genomics Resources at PlantGDB1 , 2005, Plant Physiology.

[8]  Sarah Melamed,et al.  A new model system for tomato genetics , 1997 .

[9]  Zhangjun Fei,et al.  Comprehensive EST analysis of tomato and comparative genomics of fruit ripening. , 2004, The Plant journal : for cell and molecular biology.

[10]  Ramana V. Davuluri,et al.  AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors , 2003, BMC Bioinformatics.

[11]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning by biotinylated CAP trapper. , 1996, Genomics.

[12]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[13]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[14]  Mark H. Wright,et al.  The SOL Genomics Network. A Comparative Resource for Solanaceae Biology and Beyond1 , 2005, Plant Physiology.

[15]  Steven J. M. Jones,et al.  Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding , 2008, BMC Genomics.

[16]  Luigi Frusciante,et al.  TomatEST database: in silico exploitation of EST data to explore expression patterns in tomato species , 2006, Nucleic Acids Res..

[17]  Lukas A. Mueller,et al.  The Tomato Sequencing Project, the First Cornerstone of the International Solanaceae Project (SOL) , 2005, Comparative and functional genomics.

[18]  Brent K. Harbaugh,et al.  Micro-Tom. A miniature dwarf tomato , 1989 .

[19]  Hideyuki Suzuki,et al.  Expressed sequence tags of full-length cDNA clones from the miniature tomato (Lycopersicon esculentum) cultivar Micro-Tom , 2005 .

[20]  M. Wang,et al.  Annotation and expression profile analysis of 2073 full-length cDNAs from stress-induced maize (Zea mays L.) seedlings. , 2006, The Plant journal : for cell and molecular biology.

[21]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[22]  S. Knapp Tobacco to tomatoes: a phylogenetic perspective on fruit diversity in the Solanaceae. , 2002, Journal of experimental botany.

[23]  N. Alexandrov,et al.  Features of Arabidopsis Genes and Genome Discovered using Full-length cDNAs , 2005, Plant Molecular Biology.

[24]  G. Martin,et al.  ESTs, cDNA microarrays, and gene expression profiling: tools for dissecting plant physiology and development. , 2004, The Plant journal : for cell and molecular biology.

[25]  S. Kanaya,et al.  Summary , 1940, Intellectual Property in the Conflict of Laws.

[26]  J. Bouck,et al.  Insights into corn genes derived from large-scale cDNA sequencing , 2008, Plant Molecular Biology.

[27]  G. Martin,et al.  Deductions about the Number, Organization, and Evolution of Genes in the Tomato Genome Based on Analysis of a Large Expressed Sequence Tag Collection and Selective Genomic Sequencing Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.010478. , 2002, The Plant Cell Online.

[28]  D. Shibata,et al.  Expressed sequence tags from the laboratory-grown miniature tomato (Lycopersicon esculentum) cultivar Micro-Tom and mining for single nucleotide polymorphisms and insertions/deletions in tomato cultivars. , 2005, Gene.

[29]  D. Shibata,et al.  Catalog of Micro-Tom tomato responses to common fungal, bacterial, and viral pathogens , 2005, Journal of General Plant Pathology.

[30]  Kazuo Shinozaki,et al.  Sequencing and Analysis of Approximately 40 000 Soybean cDNA Clones from a Full-Length-Enriched cDNA Library , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[31]  R. Fluhr,et al.  Comparative Cross-Species Alternative Splicing in Plants1[W][OA] , 2007, Plant Physiology.

[32]  Tao Liu,et al.  NONCODE v2.0: decoding the non-coding , 2007, Nucleic Acids Res..

[33]  D. Shibata Genome sequencing and functional genomics approaches in tomato , 2005, Journal of General Plant Pathology.

[34]  Masakazu Satou,et al.  Genome-wide analysis of alternative pre-mRNA splicing in Arabidopsis thaliana based on full-length cDNA sequences. , 2004, Nucleic acids research.

[35]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[36]  Nozomu Sakurai,et al.  MiBASE : A database of a miniature tomato cultivar Micro-Tom , 2006 .

[37]  Joseph M. Dale,et al.  Empirical Analysis of Transcriptional Activity in the Arabidopsis Genome , 2003, Science.

[38]  Tomoko Kimura,et al.  Vector-capping: a simple method for preparing a high-quality full-length cDNA library. , 2005, DNA research : an international journal for rapid publication of reports on genes and genomes.

[39]  G. Martin,et al.  Molecular basis of Pto-mediated resistance to bacterial speck disease in tomato. , 2003, Annual review of phytopathology.

[40]  P. Chomczyński,et al.  Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. , 1987, Analytical biochemistry.

[41]  D. Shibata,et al.  Comprehensive Resources for Tomato Functional Genomics Based on the Miniature Model Tomato Micro-Tom , 2008, Current genomics.

[42]  R. R. Samaha,et al.  Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. , 2000, Science.