Explicit state occupancy modelling by hidden semi-Markov models: application of Derin's scheme

Abstract Learning techniques based on the Hidden Markov Model (HMM) assumption have proved efficient in various facets of speech analysis. The application of HMMs to speech unit modelling suffers, however, from one major deficiency; the implicit state occupancy distribution which is a geometric process is inadequate to model speech segment duration. To avoid this drawback, we propose to replace the underlying Markov chain by a semi-Markov chain, a more general framework where the state occupancy is explicitly modelled by an appropriate probability density function, in our case a gamma distribution. Due to particular dependency properties inside the semi-Markov chain, the direct adaptation of the Forward-Backward algorithm to the Hidden Semi-Markov Model (HSMM) assumption leads to rather complicated solutions for the training task. With the aim of simplification, we propose to adapt the Derin's scheme, originally developed in the field of image segmentation. One important characteristic of this scheme is its a-posteriori probability formalism which implicitly introduces a normalization at each step of the basic recursions. This allows us to solve in a rigorous and efficient manner the well known underflow problem in the training task. The re-estimation formulas for the HSMM parameters are derived according to a maximum likelihood criterion. Derin's algorithm for HSMMs is presented and a practical implementation is detailed. Results are given on a 130 isolated-word recognition task.

[1]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[2]  H. Derin,et al.  A recursive algorithm for the Bayes solution of the smoothing problem , 1981 .

[3]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[4]  Hermann Ney,et al.  Training of phoneme models in a sentence recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[6]  Donald Geman,et al.  Bayes Smoothing Algorithms for Segmentation of Binary Images Modeled by Markov Random Fields , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[8]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[9]  A. Cook,et al.  Experimental evaluation of duration modelling techniques for automatic speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[11]  Y. Guedon,et al.  Use of the Derin's algorithm in hidden semi-Markov models for automatic speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  A. House,et al.  Characterization and modeling of speech-segment durations , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Pierre A. Devijver,et al.  Baum's forward-backward algorithm revisited , 1985, Pattern Recognit. Lett..

[14]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .