Recent Progress in Robust Vocabulary-Independent Speech Recognition

This paper reports recent efforts to improve the performance of CMU's robust vocabulary-independent (VI) speech recognition systems on the DARPA speaker-independent resource management task. The improvements are evaluated on 320 sentences that randomly selected from the DARPA June 88, February 89 and October 89 test sets. Our first improvement involves more detailed acoustic modeling. We incorporated more dynamic features computed from the LPC cepstra and reduced error by 15% over the baseline system. Our second improvement comes from a larger training database. With more training data, our third improvement comes from a more detailed subword modeling. We incorporated the word boundary context into our VI subword modeling and it resulted in a 30% error reduction. Finally, we used decision-tree allophone clustering to find more suitable models for the subword units not covered in the training set and further reduced error by 17%. All the techniques combined reduced the VI error rate on the resource management task from 11.1% to 5.4% (and from 15.4% to 7.4% when training and testing were under different recording environment). This vocabulary-independent performance has exceeded our vocabulary-dependent performance.

[1]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[2]  Mei-Yuh Hwang,et al.  Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  Hsiao-Wuen Hon,et al.  Allophone clustering for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Mei-Yuh Hwang,et al.  Improved acoustic modeling with the SPHINX speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hsiao-Wuen Hon,et al.  On vocabulary-independent speech modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[8]  Hsiao-Wuen Hon,et al.  Towards Speech Recognition Without Vocabulary-Specific Training , 1989, HLT.

[9]  Chin-Hui Lee,et al.  Implementation Aspects Of Large Vocabulary Recognition Based On Intraword And Interword Phonetic Units , 1990, HLT.

[10]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[11]  Shigeki Sagayama,et al.  Phoneme environment clustering for speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Mei-Yuh Hwang,et al.  Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition , 1990, HLT.

[13]  Michael Picheny,et al.  Large vocabulary natural language continuous speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.