论文信息 - HMMConverter 1.0: a toolbox for hidden Markov models - 字舞流文

HMMConverter 1.0: a toolbox for hidden Markov models

Hidden Markov models (HMMs) and their variants are widely used in Bioinformatics applications that analyze and compare biological sequences. Designing a novel application requires the insight of a human expert to define the model's architecture. The implementation of prediction algorithms and algorithms to train the model's parameters, however, can be a time-consuming and error-prone task. We here present HMMConverter, a software package for setting up probabilistic HMMs, pair-HMMs as well as generalized HMMs and pair-HMMs. The user defines the model itself and the algorithms to be used via an XML file which is then directly translated into efficient C++ code. The software package provides linear-memory prediction algorithms, such as the Hirschberg algorithm, banding and the integration of prior probabilities and is the first to present computationally efficient linear-memory algorithms for automatic parameter training. Users of HMMConverter can thus set up complex applications with a minimum of effort and also perform parameter training and data analyses for large data sets.

Irmtraud M. Meyer | Tin Yin Lam | I. Meyer | T. Y. Lam

[1] David B. Searls,et al. String Variable Grammar: A Logic Grammar Formalism for the Biological Language of DNA , 1995, J. Log. Program..

[2] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[3] Peter Steffen,et al. Compiling a domain specific language for dynamic programming , 2006 .

[4] Ewan Birney,et al. Dynamite: A Flexible Code Generating Language for Dynamic Programming Methods Used in Sequence Comparison , 1997, ISMB.

[5] Daniel S. Hirschberg,et al. A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[6] George Davey Smith,et al. Inference from genome‐wide association studies using a novel Markov model , 2008, Genetic epidemiology.

[7] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[8] Richard Durbin,et al. Comparative ab initio prediction of gene structures using pair HMMs , 2002, Bioinform..

[9] Stephen Winters-Hilt,et al. Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory , 2007, BMC Bioinformatics.

[10] Mauro Delorenzi,et al. MAMOT: hidden Markov modeling tool , 2008, Bioinform..

[11] Burkhard Morgenstern,et al. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[12] Krzysztof J. Cios,et al. A Hidden Markov Model for Predicting protein Interfaces , 2007, J. Bioinform. Comput. Biol..

[13] István Miklós,et al. A linear memory algorithm for Baum-Welch training , 2005, BMC Bioinformatics.

[14] Irmtraud M. Meyer,et al. Gene structure conservation aids similarity based gene prediction. , 2004, Nucleic acids research.

[15] Gerton Lunter. HMMoC - a compiler for hidden Markov models , 2007, Bioinform..

[16] Andrew E. Firth,et al. GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries , 2008, Nucleic Acids Res..

[17] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[18] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[19] Yin Tin,et al. Hmmconverter a Tool-box for Hidden Markov Models with Two Novel, Memory Efficient Parameter Training Algorithms , 2008 .