论文信息 - Advancing the State of the Art in Computational Gene Prediction - 字舞流文

Advancing the State of the Art in Computational Gene Prediction

Current methods for computationally predicting the locations and intron-exon structures of protein-coding genes in eukaryotic DNA are largely based on probabilistic, state-based generative models such as hidden Markov models and their various extensions. Unfortunately, little attention has been paid to the optimality of these models for the gene-parsing problem. Furthermore, as the prevalence of alternative splicing in human genes becomes more apparent, the "one gene, one parse" discipline endorsed by virtually all current gene-finding systems becomes less attractive from a biomedical perspective. Because our ability to accurately identify all the isoforms of each gene in the genome is of direct importance to biomedicine, our ability to improve gene-finding accuracy both for human and non-human DNA clearly has a potential to significantly impact human health. In this paper we review current methods and suggest a number of possible directions for further research that may alleviate some of these problems and ultimately lead to better and more useful gene predictions.

Uwe Ohler | William H. Majoros | U. Ohler | W. Majoros

[1] R Staden. Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[2] Yan-Da Li,et al. Identifying splicing sites in eukaryotic RNA: support vector machine approach , 2003, Comput. Biol. Medicine.

[3] E. Birney,et al. EGASP: the human ENCODE Genome Annotation Assessment Project , 2006, Genome Biology.

[4] Piero Fariselli,et al. A new decoding algorithm for hidden Markov models improves the prediction of the topology of all-beta membrane proteins , 2005, BMC Bioinformatics.

[5] Christopher B. Burge,et al. Identification and analysis of alternative splicing events conserved in human and mouse Gene , 2005 .

[6] J. C. Clemens,et al. Alternative Splicing of Drosophila Dscam Generates Axon Guidance Receptors that Exhibit Isoform-Specific Homophilic Binding , 2004, Cell.

[7] David Haussler,et al. Improved splice site detection in Genie , 1997, RECOMB '97.

[8] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9] Gunnar Rätsch,et al. Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[10] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11] R. Durbin,et al. Biological sequence analysis: Background on probability , 1998 .

[12] Piero Fariselli,et al. The posterior-Viterbi: a new decoding algorithm for hidden Markov models , 2005 .

[13] Andrew McCallum,et al. Gene Prediction with Conditional Random Fields , 2005 .

[14] Charles E. Chapple,et al. Diversity and functional plasticity of eukaryotic selenoproteins: identification and characterization of the SelJ family. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15] S. Salzberg,et al. Computational gene finding in plants , 2004, Plant Molecular Biology.

[16] Ian Korf,et al. Gene finding in novel genomes , 2004, BMC Bioinformatics.

[17] Michael Ashburner,et al. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review , 2002, Genome Biology.

[18] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[19] Steven Salzberg,et al. An empirical analysis of training protocols for probabilistic gene finders , 2005, BMC Bioinformatics.

[20] Michael Q. Zhang,et al. Computational identification of promoters and first exons in the human genome , 2001, Nature Genetics.

[21] William H. Majoros,et al. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding , 2005, Bioinform..

[22] A. Reymond,et al. Tandem chimerism as a means to increase protein complexity in the human genome. , 2005, Genome research.

[23] J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[24] David F. Burke,et al. PROVAT – a versatile tool for Voronoi tessellation analysis of protein structures and complexes , 2005, BMC Bioinformatics.

[25] S. Salzberg,et al. Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[26] Ron Shamir,et al. Accurate identification of alternatively spliced exons using support vector machine , 2005, Bioinform..

[27] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[28] Jotun Hein,et al. Using hidden Markov models and observed evolution to annotate viral genomes , 2006, Bioinform..

[29] Jonathan E. Allen,et al. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions , 2006, Genome Biology.

[30] Thangavel Alphonse Thanaraj,et al. ASD: the Alternative Splicing Database , 2004, Nucleic Acids Res..

[31] Ian Korf,et al. MaskerAid : a performance enhancement to RepeatMasker , 2000, Bioinform..

[32] S. Salzberg,et al. Interpolated Markov models for eukaryotic gene finding. , 1999, Genomics.

[33] Kuldip K. Paliwal,et al. Automatic Speech and Speaker Recognition , 1996 .

[34] Haixu Tang,et al. Splicing graphs and EST assembly problem , 2002, ISMB.

[35] A. Krogh. Two methods for improving performance of an HMM application for gene finding , 1997 .

[36] David Haussler,et al. A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[37] David Haussler,et al. Computational identification of evolutionarily conserved exons , 2004, RECOMB.

[38] Simon Cawley,et al. HMM sampling and applications to gene finding and alternative splicing , 2003, ECCB.

[39] Michael Q. Zhang,et al. A weight array method for splicing signal analysis , 1993, Comput. Appl. Biosci..

[40] Michael R. Brent,et al. Using Multiple Alignments to Improve Gene Prediction , 2005, RECOMB.

[41] Christopher B. Burge,et al. Recognition of Unknown Conserved Alternatively Spliced Exons , 2005, PLoS Comput. Biol..

[42] Yves Normandin. Maximum Mutual Information Estimation of Hidden Markov Models , 1996 .

[43] E. Uberbacher,et al. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[44] Steven Salzberg,et al. Efficient decoding algorithms for generalized hidden Markov model gene finders , 2005, BMC Bioinformatics.

[45] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[46] Mario Stanke,et al. Gene prediction with a hidden Markov model and a new intron submodel , 2003, ECCB.

[47] Erik L. L. Sonnhammer,et al. An HMM posterior decoder for sequence feature prediction that includes homology information , 2005, ISMB.

[48] L. Pachter,et al. SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. , 2003, Genome research.

[49] Terrence S. Furey,et al. The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[50] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[51] Felix L. Chernousko,et al. Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes , 1999, Bioinform..

[52] Gene W. Yeo,et al. Systematic Identification and Analysis of Exonic Splicing Silencers , 2004, Cell.

[53] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[54] Gunnar Rätsch,et al. RASE: recognition of alternatively spliced exons in C.elegans , 2005, ISMB.

[55] Günther Ruske,et al. Discriminative training for continuous speech recognition , 1995, EUROSPEECH.