Four-level tied-structure for efficient representation of acoustic modeling

One of the problems with context-dependent HMMs is that a large number of model parameters should be estimated using a limited amount of training data. Parameters that have the same property should be tied in order to represent acoustic models efficiently. This paper proposes four-level tied-structure for phoneme models. The four levels include 1) model level, 2) state level, 3) distribution level, and 4) feature parameter level. Although some techniques have been proposed for the first three levels, feature parameter tying in the fourth level is newly proposed in this paper. We found that feature parameter tying makes it possible to represent 1,600 mean vectors of multivariate Gaussian mixture HMMs by using the combination of 16 representative mean values in each dimension. Experimental results show that feature parameter tying reduces the amount of calculation required for recognition without significant degrading performance. Furthermore, we found that feature parameter tying is also effective for model training.

[1]  Tetsuo Kosaka,et al.  Rapid speaker adaptation using speaker-mixture allophone models applied to speaker-independent speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[3]  Shigeki Sagayama,et al.  A successive state splitting algorithm for efficient allophone modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Douglas B. Paul,et al.  The Lincoln tied-mixture HMM continuous speech recognizer , 1990, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  X. D. Huang,et al.  Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[6]  Shigeki Sagayama,et al.  Phoneme environment clustering for speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[7]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[8]  Mei-Yuh Hwang,et al.  Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..