cDNA2Genome: A tool for mapping and annotating cDNAs

BackgroundIn the last years several high-throughput cDNA sequencing projects have been funded worldwide with the aim of identifying and characterizing the structure of complete novel human transcripts. However some of these cDNAs are error prone due to frameshifts and stop codon errors caused by low sequence quality, or to cloning of truncated inserts, among other reasons. Therefore, accurate CDS prediction from these sequences first require the identification of potentially problematic cDNAs in order to speed up the posterior annotation process.ResultscDNA2Genome is an application for the automatic high-throughput mapping and characterization of cDNAs. It utilizes current annotation data and the most up to date databases, especially in the case of ESTs and mRNAs in conjunction with a vast number of approaches to gene prediction in order to perform a comprehensive assessment of the cDNA exon-intron structure. The final result of cDNA2Genome is an XML file containing all relevant information obtained in the process. This XML output can easily be used for further analysis such us program pipelines, or the integration of results into databases. The web interface to cDNA2Genome also presents this data in HTML, where the annotation is additionally shown in a graphical form. cDNA2Genome has been implemented under the W3H task framework which allows the combination of bioinformatics tools in tailor-made analysis task flows as well as the sequential or parallel computation of many sequences for large-scale analysis.ConclusionscDNA2Genome represents a new versatile and easily extensible approach to the automated mapping and annotation of human cDNAs. The underlying approach allows sequential or parallel computation of sequences for high-throughput analysis of cDNAs.

[1]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[2]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[3]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[4]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[5]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  Anders Krogh,et al.  Two Methods for Improving Performance of a HMM and their Application for Gene Finding , 1997, ISMB.

[8]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[9]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[10]  Peter Ernst,et al.  A task framework for the web interface W2H , 2003, Bioinform..

[11]  R. Durbin,et al.  Using GeneWise in the Drosophila annotation experiment. , 2000, Genome research.

[12]  H. Mewes,et al.  Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs. , 2001, Genome research.

[13]  J. P. Jenuth The NCBI. Publicly available tools and resources on the Web. , 2000, Methods in molecular biology.

[14]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[15]  Y Sakaki,et al.  [Determination of DNA sequence of the whole chromosome 21]. , 2000, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[18]  Lincoln Stein,et al.  Genome annotation: from sequence to biology , 2001, Nature Reviews Genetics.

[19]  J. Gern The Sequence of the Human Genome , 2001, Science.

[20]  Peter Ernst,et al.  W2H: WWW interface to the GCG sequence analysis package , 1998, Bioinform..