The Conserved Exon Method for Gene Finding

A new approach to gene finding is introduced called the "Conserved Exon Method" (CEM). It is based on the idea of looking for conserved protein sequences by comparing pairs of DNA sequences, identifying putative exon pairs based on conserved regions and splice junction signals then chaining pairs of putative exons together. It simultaneously predicts gene structures in both human and mouse genomic sequences (or in other pairs of sequences at the appropriate evolutionary distance). Experimental results indicate the potential usefulness of this approach.

[1]  J. Mattick,et al.  Genome research , 1990, Nature.

[2]  T. Richmond Gene recognition via spliced alignment , 2000, Genome Biology.

[3]  R. Durbin,et al.  Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. , 1999, Genome research.

[4]  Paul Levi,et al.  GENIO/scan - EST Guided Identification of Genes in Human Genomic DNA , 1998, German Conference on Bioinformatics.

[5]  M. Boguski,et al.  Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[7]  Michael Q. Zhang,et al.  A weight array method for splicing signal analysis , 1993, Comput. Appl. Biosci..

[8]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[9]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[10]  Anders Gorm Pedersen,et al.  Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome Analysis , 1997, ISMB.

[11]  E Marshall,et al.  Sequencers Endorse Plan for a Draft in 1 Year , 1999, Science.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  R. Guigó,et al.  An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[14]  D Haussler,et al.  Integrating database homology in a probabilistic gene structure model. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[15]  R. Gibbs,et al.  Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. , 1997, Genome research.

[16]  V. Solovyev,et al.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. , 1994, Nucleic acids research.

[17]  R. Gibbs,et al.  Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. , 1998, Genome research.

[18]  E. Marshall A High-Stakes Gamble on Genome Sequencing , 1999, Science.

[19]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[20]  Ewan Birney,et al.  Dynamite: A Flexible Code Generating Language for Dynamic Programming Methods Used in Sequence Comparison , 1997, ISMB.

[21]  Pankaj Agarwal,et al.  Detecting non-adjoining correlations with signals in DNA , 1998, RECOMB '98.

[22]  M. Boguski,et al.  Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. , 1996, Genome research.

[23]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[24]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[25]  Steven Salzberg,et al.  A method for identifying splice sites and translational start sites in eukaryotic mRNA , 1997, Comput. Appl. Biosci..

[26]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[27]  Richard Mott,et al.  EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA , 1997, Comput. Appl. Biosci..

[28]  David Haussler,et al.  Computational Gene nding , 1998 .

[29]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[30]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[31]  W. Miller,et al.  Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. , 1997, Genome research.

[32]  B. Roe,et al.  Comparative sequence of human and mouse BAC clones from the mnd2 region of chromosome 2p13. , 1999, Genome research.