In this paper, we describe a real-time automatic speech recognition system for Mandarin for low-cost embedded platforms using fixed-point digital signal processors. The hands-free, speaker-independent speech recognition system employs 41 mono-phone models for representing the sounds in Mandarin Chinese and 11 whole-word models for connected digit recognition. The system achieves greater than 98% recognition accuracy on our hands-free test database of 46 distinct command phrases. The system achieves 95.9% digit accuracy on a 14 speaker, hands-free, connected digit recognition database. The analysis of the results shows that for speakers without dialect, the digit recognition accuracy is almost 98%. We present a detailed analysis of the digit recognition results and propose further improvements. A realtime platform based upon Lucent’s DSP1627 fixed-point digital signal processor has been developed.
[1]
Biing-Hwang Juang,et al.
Discriminative learning for minimum error classification [pattern recognition]
,
1992,
IEEE Trans. Signal Process..
[2]
S.K. Gupta,et al.
High-accuracy connected digit recognition for mobile applications
,
1996,
1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[3]
Jan P. H. van Santen,et al.
Methods for optimal text selection
,
1997,
EUROSPEECH.
[4]
Lin-Shan Lee,et al.
Voice dictation of Mandarin Chinese
,
1997,
IEEE Signal Process. Mag..
[5]
Chiu-yu Tseng,et al.
Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary
,
1993,
IEEE Trans. Speech Audio Process..