论文信息 - Can continuous speech recognizers handle isolated speech?

Can continuous speech recognizers handle isolated speech?

Abstract Continuous speech is far more natural and efficient than isolated speech for communication. However, for current state-of-the-art automatic speech recognition systems, isolated speech recognition (ISR) is far more accurate than continuous speech recognition (CSR). It is common practice in the speech research community to build CSR systems using only CS data. However, slowing of the speaking rate is a natural reaction for a user faced with the high error rates of current CSR systems. Ironically, CSR systems typically have a much higher word error rate when speakers slow down since the acoustic models are usually derived exclusively from continuous speech corpora. In this paper, we summarize our efforts to improve the robustness of our speaker-independent CSR system against speaking styles, without suffering a recognition accuracy penalty. In particular the multi-style trained system described in this paper attains a 7.0% word error rate for a test set consisting of both isolated and continuous speech, in contrast to the 10.9% word error rate achieved by the same system trained only on continuous speech.

[1] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2] Mei-Yuh Hwang,et al. From Sphinx-II to Whisper — Making Speech Recognition Usable , 1996 .

[3] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[4] Treebank Penn,et al. Linguistic Data Consortium , 1999 .

[5] Mei-Yuh Hwang,et al. Improvements on the pronunciation prefix tree search organization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[7] Mei-Yuh Hwang,et al. Predicting unseen triphones with senones , 1996, IEEE Trans. Speech Audio Process..