A method for identifying splice sites and translational start sites in eukaryotic mRNA

This paper describes a new method for determining the consensus sequences that signal the start of translation and the boundaries between exons and introns (donor and acceptor sites) in eukaryotic mRNA. The method takes into account the dependencies between adjacent bases, in contrast to the usual technique of considering each position independently. When coupled with a dynamic program to compute the most likely sequence, new consensus sequences emerge. The consensus sequence information is summarized in conditional probability matrices which, when used to locate signals in uncharacterized genomic DNA, have greater sensitivity and specificity than conventional matrices. Species-specific versions of these matrices are especially effective at distinguishing true and false sites.

[1]  M. Kozak An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. , 1987, Nucleic acids research.

[2]  N L Harris,et al.  Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. , 1990, Methods in enzymology.

[3]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[4]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[5]  M. Kozak A consideration of alternative models for the initiation of translation in eukaryotes. , 1992, Critical reviews in biochemistry and molecular biology.

[6]  Stephen M. Mount,et al.  Splicing signals in Drosophila: intron size, information content, and consensus sequences. , 1992, Nucleic acids research.

[7]  Michael Q. Zhang,et al.  A weight array method for splicing signal analysis , 1993, Comput. Appl. Biosci..

[8]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[9]  V. Solovyev,et al.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. , 1994, Nucleic acids research.

[10]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[11]  Stephen M. Mount,et al.  AT-AC Introns—An ATtACk on Dogma , 1996, Science.

[12]  Kenneth H. Fasman,et al.  Finding Genes in Human DNA with a Hidden Markov Model , 1996, ISMB 1996.

[13]  Xin Chen,et al.  Finding Genes in DNA Using Decision Trees and Dynamic Programming , 1996, ISMB.

[14]  Edward C. Uberbacher,et al.  GRAIL: a multi-agent neural network system for gene identification , 1996, Proc. IEEE.

[15]  James W. Fickett,et al.  The Gene Identification Problem: An Overview for Developers , 1995, Comput. Chem..

[16]  David Haussler,et al.  Improved splice site detection in Genie , 1997, RECOMB '97.