Reference-based gene model prediction on DNA contigs (extended abstract)

This paper presents an algorithm for constructing multiple gene models on a set of contigs of a large genomic clone. The algorithm first uses pattern recognition-based methods to locate exons or partial exons in each contig, and then applies protein homology or EST information from the databases, as reference models, to parse the predicted exons into gene models. In the phase of gene model construction, the algorithm uses a unified framework for genes ranging from situation with homologous proteins/ESTs to no homologous protein/EST in the database. By exploiting protein homology or EST information, the algorithm is able to (1) parse exons into multiple gene models over a set of DNA contigs (possibly unoriented and unordered); (2) remove falsely predicted exons; and (3) identify and locate exons missed by the initial exon prediction.

[1]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[2]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[3]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[4]  Ying Xu,et al.  Constructing gene models from accurately predicted exons: an application of dynamic programming , 1994, Comput. Appl. Biosci..

[5]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[6]  E. Snyder,et al.  Identification of protein coding regions in genomic DNA. , 1995, Journal of molecular biology.

[7]  Ying Xu,et al.  Gene Prediction by Pattern Recognition and Homology Search , 1996, ISMB.

[8]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[9]  M. Gelfand,et al.  Prediction of the exon-intron structure by a dynamic programming approach. , 1993, Bio Systems.

[10]  Yin Xu,et al.  An Improved System for Exon Recognition and Gene Modeling in Human DNA Sequence , 1994, ISMB.

[11]  V. Solovyev,et al.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. , 1994, Nucleic acids research.

[12]  D. Searls,et al.  Gene structure prediction by linguistic methods. , 1994, Genomics.

[13]  E. Uberbacher,et al.  Discovering and understanding genes in human DNA sequence using GRAIL. , 1996, Methods in enzymology.

[14]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.