An Overview of Gene Identification: Approaches, Strategies, and Considerations

Modern biology is on the verge of officially ushering in a new era in science with the completion of the sequencing of the human genome in April 2003. While often erroneously called the “post‐genome era”, this will actually truly mark the beginning of the “genome era,” a time in which the availability of sequence data for many genomes will have a significant effect on how science is performed in the 21st century. This unit offers an overview of many of the gene prediction methods that are currently available and offers a general assessment of how well the methods work for various problems.

[1]  R. Guigó,et al.  Computational gene identification , 1997, Journal of Molecular Medicine.

[2]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[3]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[4]  R. Siliciano,et al.  The human NTT gene: identification of a novel 17-kb noncoding nuclear RNA expressed in activated CD4+ T cells. , 1997, Genomics.

[5]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[6]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[7]  Michael Ruogu Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2002, Nature Genetics.

[8]  Thomas L. Madden,et al.  PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation. , 1997, Genome research.

[9]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[10]  M. Borodovsky,et al.  Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  M. Borodovsky,et al.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. , 2001, Nucleic acids research.

[12]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[13]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[14]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[15]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[16]  J. Claverie,et al.  Computational methods for exon detection , 1998, Molecular biotechnology.

[17]  J. Claverie Exon detection by similarity searches. , 1997, Methods in molecular biology.

[18]  U. Francke,et al.  The IPW gene is imprinted and is not expressed in the Prader-Willi syndrome. , 1996, Acta geneticae medicae et gemellologiae.

[19]  M. Borodovsky,et al.  Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. , 1994, Nucleic acids research.

[20]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[21]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[22]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.

[23]  N. Harris,et al.  Genotator: a workbench for sequence annotation. , 1997, Genome research.

[24]  E V Koonin,et al.  New genes in old sequence: a strategy for finding genes in the bacterial genome. , 1994, Trends in biochemical sciences.

[25]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[26]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[27]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[28]  Michael Q. Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2001, Nature Genetics.

[29]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[30]  M. Borodovsky,et al.  Detection of new genes in a bacterial genome using Markov models for three gene classes. , 1995, Nucleic acids research.

[31]  M. Boguski,et al.  An effective approach for analyzing "prefinished" genomic sequence data. , 1999, Genome research.

[32]  Izabela Makalowska,et al.  Identification of six novel genes by experimental validation of GeneMachine predicted genes. , 2002, Gene.

[33]  M. Borodovsky,et al.  Heuristic approach to deriving models for gene finding. , 1999, Nucleic acids research.