A Variable Initialization Approach to the EM Algorithm for Better Estimation of the Parameters of Hidden Markov Model Based Acoustic Modeling of Speech Signals

The traditional method for estimation of the parameters of Hidden Markov Model (HMM) based acoustic modeling of speech uses the Expectation-Maximization (EM) algorithm. The EM algorithm is sensitive to initial values of HMM parameters and is likely to terminate at a local maximum of likelihood function resulting in non-optimized estimation for HMM and lower recognition accuracy. In this paper, to obtain better estimation for HMM and higher recognition accuracy, several candidate HMMs are created by applying EM on multiple initial models. The best HMM is chosen from the candidate HMMs which has highest value for likelihood function. Initial models are created by varying maximum frame number in the segmentation step of HMM initialization process. A binary search is applied while creating the initial models. The proposed method has been tested on TIMIT database. Experimental results show that our approach obtains improved values for likelihood function and improved recognition accuracy.

[1]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[2]  R. Ghosh Connection Topologies for Combining Genetic and Least Square Methods for Neural Learning , 2004 .

[3]  Jordi Vitrià,et al.  Learning mixture models using a genetic version of the EM algorithm , 2000, Pattern Recognition Letters.

[4]  L. R. Rabiner,et al.  Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[5]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Zbigniew Michalewicz,et al.  Evolutionary Algorithms for Constrained Parameter Optimization Problems , 1996, Evolutionary Computation.

[7]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[8]  Thomas Bäck,et al.  Evolutionary computation: an overview , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10]  Jean-Luc Gauvain,et al.  Speaker-Independent Phone Recognition Using BREF , 1992, HLT.

[11]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[12]  Djamel Bouchaffra,et al.  Genetic-based EM algorithm for learning Gaussian mixture models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[14]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[15]  Jordi Vitrià,et al.  Clustering in image space for place recognition and visual annotations for human-robot interaction , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Jean-Luc Gauvain,et al.  High performance speaker-independent phone recognition using CDHMM , 1993, EUROSPEECH.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Steve J. Young,et al.  MMI training for continuous phoneme recognition on the TIMIT database , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[20]  Sam Kwong,et al.  Optimization of HMM by a genetic algorithm , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Sang Uk Lee,et al.  Integrated Position Estimation Using Aerial Image Sequences , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Yung-Hwan Oh,et al.  A segmental-feature HMM for continuous speech recognition based on a parametric trajectory model , 2002, Speech Commun..