Delineation of coding areas in DNA sequences through assignment of codon probabilities.

Codon usage tables have been produced for E. coli, yeast, human, and mouse. The nonrandom employment of codons allows assignment of probability values to trinucleotides in any DNA sequence. These values represent the probability that a given trinucleotide is used as a codon in the organism from which the table is derived. For the graphical delineation of coding areas in DNA sequences, a probability is assigned to each trinucleotide equal to its frequency in the codon table. Averaging and smoothing procedures then greatly enhance the detectability of areas of high average codon probability and better represent the mean codon probability. These manipulations increase graphical clarity without altering the overall magnitude of probabilities. Averaging introduces an error of less than 0.5% between "raw" and smoothed data. This graphical delineation of coding sequences does not depend on the presence of punctuation, ribosomal binding sites, etc: moreover the delineation of introns and exons is also possible.

[1]  M. Gouy,et al.  Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type. , 1980, Nucleic acids research.

[2]  Ashok S. Kolaskar,et al.  A method to locate protein coding sequences in DNA of prokaryotic systems , 1985, Nucleic Acids Res..

[3]  R. Blake,et al.  Analysis of the codon bias in E. coli sequences. , 1984, Journal of biomolecular structure & dynamics.

[4]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[5]  Rodger Staden,et al.  Graphic methods to determine the function of nucleic acid sequences , 1984, Nucleic Acids Res..

[6]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[7]  M. Gouy,et al.  Codon usage in bacteria: correlation with gene expressivity. , 1982, Nucleic acids research.

[8]  M. Gribskov,et al.  The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression , 1984, Nucleic Acids Res..

[9]  Manolo Gouy,et al.  Codon catalog usage is a genome strategy modulated for gene expressivity , 1981, Nucleic Acids Res..

[10]  Francis Rodier,et al.  Key for protein coding sequences identification: computer analysis of codon strategy , 1982, Nucleic Acids Res..

[11]  J. C. Shepherd Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. D. McLachlan,et al.  Codon preference and its use in identifying protein coding regions in long DNA sequences , 1982, Nucleic Acids Res..