Automatic speech recognition in Mandarin for embedded platforms

In this paper, we describe a real-time automatic speech recognition system for Mandarin for low-cost embedded platforms using fixed-point digital signal processors. The hands-free, speaker-independent speech recognition system employs 41 mono-phone models for representing the sounds in Mandarin Chinese and 11 whole-word models for connected digit recognition. The system achieves greater than 98% recognition accuracy on our hands-free test database of 46 distinct command phrases. The system achieves 95.9% digit accuracy on a 14 speaker, hands-free, connected digit recognition database. The analysis of the results shows that for speakers without dialect, the digit recognition accuracy is almost 98%. We present a detailed analysis of the digit recognition results and propose further improvements. A realtime platform based upon Lucent’s DSP1627 fixed-point digital signal processor has been developed.

[1]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[2]  S.K. Gupta,et al.  High-accuracy connected digit recognition for mobile applications , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Jan P. H. van Santen,et al.  Methods for optimal text selection , 1997, EUROSPEECH.

[4]  Lin-Shan Lee,et al.  Voice dictation of Mandarin Chinese , 1997, IEEE Signal Process. Mag..

[5]  Chiu-yu Tseng,et al.  Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary , 1993, IEEE Trans. Speech Audio Process..