An Algorithm for Highly Specific Recognition of Protein-coding Regions

Since absolutely reliable recognition of protein-coding regions in eukaryote genomic DNA sequences by computational methods is unattainable, most existing algorithms try to keep some balance between underprediction and overprediction. However, in experimental practice it is often su cient to have just a few protein-coding segments, but predicted with high speci city, that is, with (almost) no overprediction. Such predictions are then used for construction of oligonucleotide probes and PCR primers for analysis of cDNA libraries or total cellular RNA. Here we present a combinatorial algorithm solving this problem. Unlike other prediction schemes, the algorithm uses only the simplest statistical parameters (codon usage and positional nucleotide sequences in splicing sites) and thus can be used for analysis of obscure genomes, when large learning sets are unavailable. The algorithm's structure allows one to simply tune it for various experimental settings.

[1]  M. Gelfand,et al.  Prediction of the exon-intron structure by a dynamic programming approach. , 1993, Bio Systems.

[2]  Y Xu,et al.  Recognizing exons in genomic sequence using GRAIL II. , 1994, Genetic engineering.

[3]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[4]  M S Gelfand,et al.  Prediction of function in DNA sequence analysis. , 1995, Journal of computational biology : a journal of computational molecular cell biology.

[5]  Mikhail S. Gelfand,et al.  Recognition of Genes in Human DNA Sequences , 1996, J. Comput. Biol..