HMMConverter 1.0: a toolbox for hidden Markov models

Hidden Markov models (HMMs) and their variants are widely used in Bioinformatics applications that analyze and compare biological sequences. Designing a novel application requires the insight of a human expert to define the model's architecture. The implementation of prediction algorithms and algorithms to train the model's parameters, however, can be a time-consuming and error-prone task. We here present HMMConverter, a software package for setting up probabilistic HMMs, pair-HMMs as well as generalized HMMs and pair-HMMs. The user defines the model itself and the algorithms to be used via an XML file which is then directly translated into efficient C++ code. The software package provides linear-memory prediction algorithms, such as the Hirschberg algorithm, banding and the integration of prior probabilities and is the first to present computationally efficient linear-memory algorithms for automatic parameter training. Users of HMMConverter can thus set up complex applications with a minimum of effort and also perform parameter training and data analyses for large data sets.

[1]  David B. Searls,et al.  String Variable Grammar: A Logic Grammar Formalism for the Biological Language of DNA , 1995, J. Log. Program..

[2]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[3]  Peter Steffen,et al.  Compiling a domain specific language for dynamic programming , 2006 .

[4]  Ewan Birney,et al.  Dynamite: A Flexible Code Generating Language for Dynamic Programming Methods Used in Sequence Comparison , 1997, ISMB.

[5]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[6]  George Davey Smith,et al.  Inference from genome‐wide association studies using a novel Markov model , 2008, Genetic epidemiology.

[7]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[8]  Richard Durbin,et al.  Comparative ab initio prediction of gene structures using pair HMMs , 2002, Bioinform..

[9]  Stephen Winters-Hilt,et al.  Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory , 2007, BMC Bioinformatics.

[10]  Mauro Delorenzi,et al.  MAMOT: hidden Markov modeling tool , 2008, Bioinform..

[11]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[12]  Krzysztof J. Cios,et al.  A Hidden Markov Model for Predicting protein Interfaces , 2007, J. Bioinform. Comput. Biol..

[13]  István Miklós,et al.  A linear memory algorithm for Baum-Welch training , 2005, BMC Bioinformatics.

[14]  Irmtraud M. Meyer,et al.  Gene structure conservation aids similarity based gene prediction. , 2004, Nucleic acids research.

[15]  Gerton Lunter HMMoC - a compiler for hidden Markov models , 2007, Bioinform..

[16]  Andrew E. Firth,et al.  GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries , 2008, Nucleic Acids Res..

[17]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[18]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[19]  Yin Tin,et al.  Hmmconverter a Tool-box for Hidden Markov Models with Two Novel, Memory Efficient Parameter Training Algorithms , 2008 .