论文信息 - Probabilistic-trajectory segmental HMMs

Probabilistic-trajectory segmental HMMs

Abstract “Segmental hidden Markov models” (SHMMs) are intended to overcome important speech-modelling limitations of the conventional-HMM approach by representing sequences (or segments) of features and incorporating the concept of trajectories to describe how features change over time. A novel feature of the approach presented in this paper is thatextra-segmentalvariability between different examples of a sub-phonemic speech segment is modelled separately fromintra-segmentalvariability within any one example. The extra-segmental component of the model is represented in terms of variability in the trajectory parameters, and these models are therefore referred to as “probabilistic-trajectory segmental HMMs” (PTSHMMs). This paper presents the theory of PTSHMMs using a linear trajectory description characterized by slope and mid-point parameters, and presents theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs. Experiments have demonstrated that, for any given feature set, a linear PTSHMM can substantially reduce the error rate in comparison with a conventional HMM, both for a connected-digit recognition task and for a phonetic classification task. Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the mid-point parameter.

Martin J. Russell | Wendy J. Holmes | M. Russell | W. Holmes

[1] Louis A. Liporace,et al. Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[2] R. Moore,et al. Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Peter F. Brown,et al. The acoustic-modeling problem in automatic speech recognition , 1987 .

[4] C. J. Wellekens,et al. Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] S. Rocous,et al. Stochastic segment modeling using the estimate-maximize algorithm , 1988 .

[6] Mari Ostendorf,et al. Improvements in the Stochastic Segment Model for Phoneme Recognition , 1989, HLT.

[7] Hsiao-Wuen Hon,et al. Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[8] James Glass,et al. Acoustic segmentation and phonetic classification in the SUMMIT system , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[9] Mari Ostendorf,et al. A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10] Victor Zue,et al. Speech recognition using stochastic explicit-segment modeling , 1991, EUROSPEECH.

[11] S. M. Peeling,et al. Variable frame rate analysis in the ARM continuous speech recognition system , 1991, Speech Commun..

[12] Martin J. Russell. A segmental hidden Markov model for speech pattern processing , 1992 .

[13] Vassilios Digalakis,et al. Segment-based stochastic models of spectral dynamics for continuous speech recognition , 1992 .

[14] Martin Russell,et al. A segmental HMM for speech pattern modelling , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15] Mari Ostendorf,et al. On the Use of Tied-Mixture Distributions , 1993, HLT.

[16] Oded Ghitza,et al. Hidden Markov models with templates as non-stationary states: an application to speech recognition , 1993, Comput. Speech Lang..

[17] Herbert Gish,et al. A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] Mark J. F. Gales,et al. The theory of segmental hidden Markov models , 1993 .

[19] James R. Glass,et al. Statistical trajectory models for phonetic recognition , 1994, ICSLP.

[20] S. Krishnan,et al. Segmental phoneme recognition using piecewise linear regression , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[21] Xiaodong Sun,et al. Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[22] Steve Young,et al. Large vocabulary speech recognition , 1995 .

[23] P.C. Woodland,et al. The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[24] Martin J. Russell,et al. Experimental evaluation of segmental HMMs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[25] Li Deng,et al. A Markov model containing state-conditioned second-order non-stationarity: application to speech recognition , 1995, Comput. Speech Lang..

[26] Martin J. Russell,et al. Speech recognition using a linear dynamic segmental HMM , 1995, EUROSPEECH.

[27] Mitch Weintraub,et al. Automatic Learning of Word Pronunciation from Data , 1996 .

[28] James R. Glass,et al. A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29] Hsiao-Chuan Wang,et al. A segmental probabilistic model of speech using an orthogonal polynomial representation: Application to text-independent speaker verification , 1996, Speech Commun..

[30] Martin J. Russell,et al. Modeling speech variability with segmental HMMs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31] Mari Ostendorf,et al. From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[32] Hynek Hermansky,et al. Towards increasing speech recognition error rates , 1995, Speech Commun..

[33] S. Young. Large Vocabulary Continuous Speech Recognition : a ReviewSteve , 1996 .

[34] Herbert Gish,et al. Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[35] Li Deng,et al. Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions , 1997, IEEE Trans. Speech Audio Process..

[36] Mari Ostendorf,et al. Adaptation of polynomial trajectory segment models for large vocabulary speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37] Li Deng,et al. Speaker adaptation experiments using nonstationary-state hidden Markov models: a MAP approach , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38] M.J. Russell,et al. Linear trajectory segmental HMMs , 1997, IEEE Signal Processing Letters.

[39] Philip N. Garner,et al. Using formant frequencies in speech recognition , 1997, EUROSPEECH.

[40] Yifan Gong,et al. The importance of segmentation probability in segment based speech recognizers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41] Hermann Ney,et al. A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[42] Martin J. Russell,et al. Linear dynamic segmental HMMs: variability representation and training procedure , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43] Kuldip K. Paliwal,et al. Model parameter estimation for mixture density polynomial segment models , 1998, Comput. Speech Lang..

[44] Jacob Goldberger,et al. Segmental modeling using a continuous mixture of nonparametric models , 1997, IEEE Trans. Speech Audio Process..