Probabilistic-trajectory segmental HMMs

Abstract “Segmental hidden Markov models” (SHMMs) are intended to overcome important speech-modelling limitations of the conventional-HMM approach by representing sequences (or segments) of features and incorporating the concept of trajectories to describe how features change over time. A novel feature of the approach presented in this paper is thatextra-segmentalvariability between different examples of a sub-phonemic speech segment is modelled separately fromintra-segmentalvariability within any one example. The extra-segmental component of the model is represented in terms of variability in the trajectory parameters, and these models are therefore referred to as “probabilistic-trajectory segmental HMMs” (PTSHMMs). This paper presents the theory of PTSHMMs using a linear trajectory description characterized by slope and mid-point parameters, and presents theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs. Experiments have demonstrated that, for any given feature set, a linear PTSHMM can substantially reduce the error rate in comparison with a conventional HMM, both for a connected-digit recognition task and for a phonetic classification task. Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the mid-point parameter.

[1]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[2]  R. Moore,et al.  Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[4]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  S. Rocous,et al.  Stochastic segment modeling using the estimate-maximize algorithm , 1988 .

[6]  Mari Ostendorf,et al.  Improvements in the Stochastic Segment Model for Phoneme Recognition , 1989, HLT.

[7]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[8]  James Glass,et al.  Acoustic segmentation and phonetic classification in the SUMMIT system , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Victor Zue,et al.  Speech recognition using stochastic explicit-segment modeling , 1991, EUROSPEECH.

[11]  S. M. Peeling,et al.  Variable frame rate analysis in the ARM continuous speech recognition system , 1991, Speech Commun..

[12]  Martin J. Russell A segmental hidden Markov model for speech pattern processing , 1992 .

[13]  Vassilios Digalakis,et al.  Segment-based stochastic models of spectral dynamics for continuous speech recognition , 1992 .

[14]  Martin Russell,et al.  A segmental HMM for speech pattern modelling , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Mari Ostendorf,et al.  On the Use of Tied-Mixture Distributions , 1993, HLT.

[16]  Oded Ghitza,et al.  Hidden Markov models with templates as non-stationary states: an application to speech recognition , 1993, Comput. Speech Lang..

[17]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Mark J. F. Gales,et al.  The theory of segmental hidden Markov models , 1993 .

[19]  James R. Glass,et al.  Statistical trajectory models for phonetic recognition , 1994, ICSLP.

[20]  S. Krishnan,et al.  Segmental phoneme recognition using piecewise linear regression , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[22]  Steve Young,et al.  Large vocabulary speech recognition , 1995 .

[23]  P.C. Woodland,et al.  The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[24]  Martin J. Russell,et al.  Experimental evaluation of segmental HMMs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[25]  Li Deng,et al.  A Markov model containing state-conditioned second-order non-stationarity: application to speech recognition , 1995, Comput. Speech Lang..

[26]  Martin J. Russell,et al.  Speech recognition using a linear dynamic segmental HMM , 1995, EUROSPEECH.

[27]  Mitch Weintraub,et al.  Automatic Learning of Word Pronunciation from Data , 1996 .

[28]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29]  Hsiao-Chuan Wang,et al.  A segmental probabilistic model of speech using an orthogonal polynomial representation: Application to text-independent speaker verification , 1996, Speech Commun..

[30]  Martin J. Russell,et al.  Modeling speech variability with segmental HMMs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[32]  Hynek Hermansky,et al.  Towards increasing speech recognition error rates , 1995, Speech Commun..

[33]  S. Young Large Vocabulary Continuous Speech Recognition : a ReviewSteve , 1996 .

[34]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[35]  Li Deng,et al.  Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions , 1997, IEEE Trans. Speech Audio Process..

[36]  Mari Ostendorf,et al.  Adaptation of polynomial trajectory segment models for large vocabulary speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Li Deng,et al.  Speaker adaptation experiments using nonstationary-state hidden Markov models: a MAP approach , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  M.J. Russell,et al.  Linear trajectory segmental HMMs , 1997, IEEE Signal Processing Letters.

[39]  Philip N. Garner,et al.  Using formant frequencies in speech recognition , 1997, EUROSPEECH.

[40]  Yifan Gong,et al.  The importance of segmentation probability in segment based speech recognizers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[42]  Martin J. Russell,et al.  Linear dynamic segmental HMMs: variability representation and training procedure , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  Kuldip K. Paliwal,et al.  Model parameter estimation for mixture density polynomial segment models , 1998, Comput. Speech Lang..

[44]  Jacob Goldberger,et al.  Segmental modeling using a continuous mixture of nonparametric models , 1997, IEEE Trans. Speech Audio Process..