Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition

Context-dependent phone models are applied to speaker-independent continuous speech recognition and shown to be effective in this domain. Several previously proposed context-dependent models are evaluated, and two new context-dependent phonetic units are introduced: function-word-dependent phone models, which focus on the most difficult subvocabulary; and generalized triphones, which combine similar triphones on the basis of an information-theoretic measure. The subword clustering procedure used for generalized triphones can find the optimal number of models, given a fixed amount of training data. It is shown that context-dependent modeling reduces the error rate by as much as 60%. >

[1]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[2]  Lalit R. Bahl,et al.  Further results on the recognition of a continuously read natural corpus , 1980, ICASSP.

[3]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[4]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  M. Lennig,et al.  Modeling acoustic-phonetic detail in an HMM-based large vocabulary speech recognizer , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Aaron E. Rosenberg,et al.  Demisyllable-based isolated word recognition system , 1983 .

[8]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Mei-Yuh Hwang,et al.  Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.

[10]  Michael Picheny,et al.  Acoustic Markov models used in the Tangora speech recognition system , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  S. Roucos,et al.  The role of word-dependent coarticulatory effects in a phoneme-based speech recognition system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Anne-Marie Derouault,et al.  Context-dependent phonetic Markov models for large vocabulary speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Kai-Fu Lee,et al.  On large-vocabulary speaker-independent continuous speech recognition , 1988, Speech Commun..

[15]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[16]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[17]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[18]  R. Pieraccini,et al.  Definition and evaluation of phonetic units for speech recognition by hidden Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Richard M. Schwartz,et al.  Improved hidden Markov modeling of phonemes for continuous speech recognition , 1984, ICASSP.

[20]  Richard M. Schwartz,et al.  A preliminary design of a phonetic vocoder based on a diphone model , 1980, ICASSP.

[21]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  D. B. Paul,et al.  Speaker stress-resistant continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[23]  Robert L. Mercer,et al.  An information theoretic approach to the automatic determination of phonemic baseforms , 1984, ICASSP.

[24]  P. D'Orta,et al.  Phoneme classification for real time speech recognition of Italian , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[26]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[27]  Paul Mermelstein,et al.  Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[28]  Frank K. Soong,et al.  High performance connected digit recognition, using hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.