论文信息 - High-Accuracy Large-Vocabulary Speech Recognition Using Mixture Tying and Consistency Modeling

High-Accuracy Large-Vocabulary Speech Recognition Using Mixture Tying and Consistency Modeling

Improved acoustic modeling can significantly decrease the error rate in large-vocabulary speech recognition. Our approach to the problem is twofold. We first propose a scheme that optimizes the degree of mixture tying for a given amount of training data and computational resources. Experimental results on the Wall Street Journal (WSJ) Corpus show that this new form of output distribution achieves a 25% reduction in error rate over typical tied-mixture systems. We then show that an additional improvement can be achieved by modeling local time correlation with linear discriminant features.

Vassilios Digalakis | Hy Murveit

[1] Jonathan G. Fiscus,et al. Benchmark Tests for the DARPA Spoken Language Program , 1993, HLT.

[2] H. Ney,et al. Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Vassilios Digalakis,et al. Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] C. J. Wellekens,et al. Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Chin-Hui Lee,et al. Acoustic modeling for large vocabulary speech recognition , 1990 .

[6] Xuedong Huang,et al. Performance comparison between semicontinuous and discrete hidden Markov models of speech , 1988 .

[7] Mari Ostendorf,et al. Maximum likelihood clustering of Gaussians for speech recognition , 1994, IEEE Trans. Speech Audio Process..

[8] L. R. Rabiner,et al. Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[9] Mari Ostendorf,et al. On the Use of Tied-Mixture Distributions , 1993, HLT.

[10] Vassilios Digalakis,et al. Techniques to Achieve an Accurate Real-Time Large-Vocabulary Speech Recognition System , 1994, HLT.

[11] Michael Picheny,et al. Context Dependent Modeling of Phones in Continuous Speech Using Decision Trees , 1991, HLT.

[12] Jonathan G. Fiscus,et al. 1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[13] D. B. Paul,et al. Speaker stress-resistant continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[14] George R. Doddington. Phonetically sensitive discriminants for improved speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[15] Kai-Fu Lee,et al. Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[16] Mari Ostendorf,et al. ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[17] Mei-Yuh Hwang,et al. Subphonetic modeling with Markov states-Senone , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] Steve Young,et al. The general use of tying in phoneme-based HMM speech recognisers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19] B. Juang,et al. Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[20] Mitch Weintraub,et al. Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21] Jerome R. Bellegarda,et al. Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[22] S. Furui. On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[23] D. B. Paul,et al. The Lincoln robust continuous speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.