Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer

We propose a scheme that improves the robustness of continuous HMM systems that use mixture observation densities by sharing the same mixture components among different HMM states. The sets of HMM states that share the same mixture components are determined automatically using agglomerative clustering techniques. Experimental results on the Wall-Street Journal Corpus show that our new form of output distributions achieves a 25% reduction in error rate over typical tied-mixture systems.<<ETX>>

[1]  D. B. Paul,et al.  Speaker stress-resistant continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[2]  Steve Young,et al.  The general use of tying in phoneme-based HMM speech recognisers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[5]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[6]  Chin-Hui Lee,et al.  Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models , 1991, HLT.

[7]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[8]  Jonathan G. Fiscus,et al.  Benchmark Tests for the DARPA Spoken Language Program , 1993, HLT.

[9]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Xuedong Huang,et al.  Performance comparison between semicontinuous and discrete hidden Markov models of speech , 1988 .

[11]  Mei-Yuh Hwang,et al.  Subphonetic modeling with Markov states-Senone , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[13]  Mitch Weintraub,et al.  Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Mari Ostendorf,et al.  On the Use of Tied-Mixture Distributions , 1993, HLT.

[15]  Michael Picheny,et al.  Context Dependent Modeling of Phones in Continuous Speech Using Decision Trees , 1991, HLT.