Evaluation of segmental unit input HMM

The standard HMM cannot fully express the time variant features while staying at the same state. So as not to ignore the dynamic changes of the speech characteristics, various methods have been studied. In this paper, we compare a segmental unit input HMM where several successive frames are combined and become an input vector, with conditional density HMM or the use of regression coefficients and evaluate them. Using segmental statistics, since the dimension of the parameters increases, results in a lesser precision in estimation of the covariance matrix. Therefore we used methods for compressing dimension and reducing computation by K-L expansion and MQDF. By segmental unit inputting for the basic structure HMM, we got a better recognition rate than by traditional methods and the combination of a segmental unit of successive mel-cepstrum frames and regression coefficients showed the best recognition rate.

[1]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Francis Jack Smith,et al.  A hidden Markov model with optimized inter-frame dependence , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Seiichi Nakagawa,et al.  Comparative evaluation of segmental unit input HMM and conditional density HMM , 1995, EUROSPEECH.

[4]  Martin J. Russell,et al.  Speech recognition using a linear dynamic segmental HMM , 1995, EUROSPEECH.

[5]  Hisashi Wakita,et al.  Neural predictive hidden Markov model , 1990, ICSLP.

[6]  Satoshi Takahashi,et al.  Phoneme HMMs constrained by frame correlations , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  C. J. Wellekens,et al.  Explicit correlation in hidden Markov model for speech recognition , 1987 .

[9]  Hideki Kawahara,et al.  A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Seiichi Nakagawa,et al.  Syllable Recognition by Hidden Markov Model Using Fixed-Length Segmental Statistics , 1992 .

[11]  Ted H. Applebaum,et al.  Tradeoffs in the design of regression features for word recognition , 1991, EUROSPEECH.

[12]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[13]  Fumitaka Kimura,et al.  Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Seiichi Nakagawa,et al.  An unsupervised speaker adaptation method for continuous parameter HMM by maximum a posteriori probability estimation , 1994, ICSLP.

[15]  Mark J. F. Gales,et al.  Segmental hidden Markov models , 1993, EUROSPEECH.