Continuous speech recognition using segmental unit input HMMs with a mixture of probability density functions and context dependency

It is well-known that HMMs only of the basic structure cannot capture the correlations among successive frames adequately. In our previous work, to solve this problem, segmental unit HMMs were introduced and their e ectiveness was shown. And the integration of cepstrum and cepstrum into the segmental unit HMMs was also found to improve the recognition performance in the work. In this paper, we investigated further re nements of the models by using a mixture of PDFs and/or context dependency, where, for a given syllable, only a preceding vowel was treated as the context information. Recognition experiments showed that the accuracy rate was improved by 23 %, which clearly indicates the e ectiveness of the renements examined in this paper. The proposed syllablebased HMM outperformed a triphone model.

[1]  Joseph Picone,et al.  Advances in alphadigit recognition using syllables , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Yoshihiro Takada,et al.  Neural Predictive Hidden Markov Model for Speech Recognition , 1995, IEICE Trans. Inf. Syst..

[3]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[4]  Seiichi Nakagawa,et al.  Syllable Recognition by Hidden Markov Model Using Fixed-Length Segmental Statistics , 1992 .

[5]  Ted H. Applebaum,et al.  Tradeoffs in the design of regression features for word recognition , 1991, EUROSPEECH.

[6]  Li Zhao,et al.  A Comparative Study of Output Probability Functions in HMMs , 1995, IEICE Trans. Inf. Syst..

[7]  Seiichi Nakagawa,et al.  An unsupervised speaker adaptation method for continuous parameter HMM by maximum a posteriori probability estimation , 1994, ICSLP.

[8]  Seiichi Nakagawa,et al.  Comparative evaluation of segmental unit input HMM and conditional density HMM , 1995, EUROSPEECH.

[9]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[10]  Steven Greenberg,et al.  Incorporating information from syllable-length time scales into automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  Francis Jack Smith,et al.  An HMM with optimized segment-dependent observations for speech recognition , 1995, EUROSPEECH.

[12]  Rhys James Jones,et al.  Continuous speech recognition using syllables , 1997, EUROSPEECH.