GRAIL: a multi-agent neural network system for gene identification

Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a gene localization and modeling system, called GRAIL. GRAIL is a multiple sensor-neural network-based system. It localizes genes in anonymous DNA sequence by recognizing features related to protein-coding regions and the boundaries of coding regions, and then combines the recognized features using a neural network system. Localized coding regions are then "optimally" parsed into a gene model. Through years of extensive testing GRAIL consistently achieves about 90% of coding portions of test genes with a false positive rate of about 10% A number of genes for major genetic diseases have been located through the use of GRAIL, and over 1000 research laboratories worldwide use GRAIL on regular bases for localization of genes on their newly sequenced DNA.

[1]  D. Searls,et al.  Gene structure prediction by linguistic methods. , 1994, Genomics.

[2]  James W. Fickett,et al.  The GenBank genetic sequence databank , 1986, Nucleic Acids Res..

[3]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[4]  Michael R. Hayden,et al.  The prediction of exons through an analysis of spliceable open reading frames , 1992, Nucleic Acids Res..

[5]  E. Snyder,et al.  Identification of protein coding regions in genomic DNA. , 1995, Journal of molecular biology.

[6]  I Sauvaget,et al.  K-tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping. , 1990, Methods in enzymology.

[7]  E. Uberbacher,et al.  Gene recognition and assembly in the GRAIL system: Progress and challenges , 1993 .

[8]  Ying Xu,et al.  Constructing gene models from accurately predicted exons: an application of dynamic programming , 1994, Comput. Appl. Biosci..

[9]  E. Uberbacher,et al.  Discovering and understanding genes in human DNA sequence using GRAIL. , 1996, Methods in enzymology.

[10]  M S Gelfand,et al.  Computer prediction of the exon-intron structure of mammalian pre-mRNAs. , 1990, Nucleic acids research.

[11]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[12]  J. Weissenbach,et al.  The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion molecules , 1991, Cell.

[13]  Jean Mosser,et al.  Putative X-linked adrenoleukodystrophy gene shares unexpected homology with ABC transporters , 1993, Nature.

[14]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[15]  R C Mann,et al.  An artificial intelligence approach to DNA sequence feature recognition. , 1992, Trends in biotechnology.

[16]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[17]  Ying Xu,et al.  Correcting sequencing errors in DNA coding regions using a dynamic programming approach , 1995, Comput. Appl. Biosci..