Amino acid translation program for full-length cDNA sequences with frameshift errors.

Here we present an amino acid translation program designed to suggest the position of experimental frameshift errors and predict amino acid sequences for full-length cDNA sequences having phred scores. Our program generates artificial insertions into artificial deletions from low-accuracy positions of the original sequence, thereby generating many candidate sequences. The validity of the most probable sequence (the likelihood that it represents the actual protein) is evaluated by using a score (V(a)) that is calculated in light of the Kozak consensus, preferred codon usage, and position of the initiation codon. To evaluate the software, we have used a database in which, out of 612 cDNA sequences, 524 (86%) carried 773 frameshift errors in the coding sequence. Our software detected and corrected 48% of the total frameshift errors in 62% of the total cDNA sequences with frameshift errors. The false positive rate of frameshift correction was 9%, and 91% of the suggested frameshifts were true.

[1]  M. Gribskov,et al.  The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression , 1984, Nucleic Acids Res..

[2]  M. Kozak An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. , 1987, Nucleic acids research.

[3]  O. White,et al.  A quality control algorithm for DNA sequencing projects. , 1993, Nucleic acids research.

[4]  Mark E. Dalphin,et al.  The translational termination signal database , 1993, Nucleic Acids Res..

[5]  P. Farabaugh,et al.  Pulling the ribosome out of frame by +1 at a programmed frameshift site by cognate binding of aminoacyl-tRNA , 1995, Molecular and cellular biology.

[6]  P. Farabaugh Programmed translational frameshifting. , 1996, Annual review of genetics.

[7]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[8]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.

[9]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[10]  P. Richterich,et al.  Estimation of errors in "raw" DNA sequences: a validation study. , 1998, Genome research.

[11]  Mark E. Dalphin,et al.  The translational signal database, TransTerm, is now a relational database , 1998, Nucleic Acids Res..

[12]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[13]  C. V. Jongeneel,et al.  ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences , 1999, ISMB.

[14]  A Danchin,et al.  Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence. , 1999, Genome research.

[15]  A Suyama,et al.  Statistical analysis of the 5' untranslated region of human mRNA using "Oligo-Capped" cDNA libraries. , 2000, Genomics.