Investigating EA Based Training of HMM using a Sequential Parameter Optimization Approach

Hidden Markov models (HHMs) have become an increasingly useful tool for the analysis of biological data. HMM based tools are currently used for generating protein sequence profiles, predicting protein secondary structure, finding motifs in DNA sequence data, and many other bioinformatics applications. Such models are often constructed using gradient-decent based training methods such as a Baum-Welch learning algorithm or a Segmental K-means algorithm. HMM training involves estimating the model parameters based on an existing set of data. Evolutionary algorithms (EAs) have also been applied to this problem, but have typically been observed to perform best when combined with BW learning forming a hybrid approach In this work we describe a sequential parameter optimization approach for investigating the effectiveness of using EAs for training HMMs. We discuss preliminary results of this approach as obtained using synthetic DNA data sets. This approach not only offers the possibility for improving the effectiveness of the EA but will also provide much needed insight into directions for future improvements in the design of EAs for the construction of HMMs in general.

[1]  René Thomsen Evolving the Topology of Hidden Markov Models Using Evolutionary Algorithms , 2002, PPSN.

[2]  Thomas Bartz-Beielstein,et al.  Experimental Research in Evolutionary Computation - The New Experimentalism , 2010, Natural Computing Series.

[3]  Sandor Markon,et al.  Threshold selection, hypothesis tests, and DOE methods , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[4]  Thomas Bartz-Beielstein,et al.  Tuning search algorithms for real-world applications: a regression tree based approach , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[5]  Neil A. Butler,et al.  Optimal and orthogonal Latin hypercube designs for computer experiments , 2001 .

[6]  Thomas Bartz-Beielstein,et al.  Sequential parameter optimization , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  T Yada,et al.  Extraction of hidden Markov model representations of signal patterns in DNA sequences. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[8]  Thomas Kiel Rasmussen,et al.  Improved Hidden Markov Model training for multiple sequence alignment by a particle swarm optimization-evolutionary algorithm hybrid. , 2003, Bio Systems.

[9]  Thomas Bartz-Beielstein,et al.  Validation and Optimization of an Elevator Simulation Model with Modern Search Heuristics , 2005 .

[10]  Adam Prügel-Bennett,et al.  Training HMM structure with genetic algorithm for biological sequence analysis , 2004, Bioinform..

[11]  Masato Ishikawa,et al.  DNA Sequence Analysis using Hidden Markov Model and Genetic Algorithm , 1994 .

[12]  Kim-Fung Man,et al.  Optimisation of HMM topology and its model parameters by genetic algorithms , 2001, Pattern Recognit..

[13]  矢田 哲士 Stochastic models representing DNA sequence data : construction algorithms and their applications to prediction of gene structure and function , 1998 .

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Sam Kwong,et al.  Analysis of parallel genetic algorithms on HMM based speech recognition system , 1997 .

[16]  Adam Prügel-Bennett,et al.  Evolving hidden Markov models for protein secondary structure prediction , 2005, 2005 IEEE Congress on Evolutionary Computation.

[17]  Max D. Morris,et al.  Factorial sampling plans for preliminary computational experiments , 1991 .

[18]  Thomas Bartz-Beielstein,et al.  Multi-objective evolutionary design of mold temperature control using DACE for parameter optimization , 2007 .

[19]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[20]  M. D. McKay,et al.  A comparison of three methods for selecting values of input variables in the analysis of output from a computer code , 2000 .

[21]  Mohamed Slimane,et al.  Optimizing Hidden Markov Models with a Genetic Algorithm , 1995, Artificial Evolution.

[22]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .