A Decoder for Lvcsr Based on Fixed-Point Arithmetic

The increasing computational power of embedded CPU's motivates the fixed-point implementation of highly accurate large-vocabulary continuous-speech (LVCSR) algorithms, to achieve the same performance on the device as on the server. We report on methods for the fixed-point implementation of the frame-synchronous beam-search Viterbi decoder, N-grams language models, and HMM likelihood computation. This fixed-point recognizer is as accurate as our best floating-point recognizer in several LVCSR experiments on the DARPA switch-board task and on an AT&T proprietary task, with different types of acoustic front-ends and HMM's. We also present experiments on the DARPA resource management task using the StrongARM-1100 206 MHz CPU, where the fixed-point implementation enables real-time performance: the floating-point recognizer, with floating-point software emulation, is 50 times slower for the same accuracy

[1]  Satoshi Takahashi,et al.  On the use of scalar quantization for fast HMM computation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[3]  Hermann Ney,et al.  Using SIMD instructions for fast likelihood calculation in LVCSR , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Yu-Hung Kao,et al.  A low cost dynamic vocabulary speech recognizer on a GPP-DSP system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Imre Kiss,et al.  Comparison of low footprint acoustic modeling techniques for embedded ASR systems , 2005, INTERSPEECH.

[6]  O. Viikki,et al.  ASR in portable wireless devices , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7]  Miroslav Novak,et al.  Two-pass search strategy for large list recognition on embedded speech recognition platforms , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Sebastian Stüker,et al.  Rapid porting of ASR-systems to mobile devices , 2005, INTERSPEECH.

[9]  Miroslav Novak,et al.  Towards large vocabulary ASR on embedded platforms , 2004, INTERSPEECH.

[10]  I. Watson Towards large vocabulary ASR on embedded platforms , 2004 .

[11]  Aaron E. Rosenberg,et al.  On the implementation of ASR algorithms for hand-held wireless mobile devices , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Yifan Gong,et al.  Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[14]  Jeongsu Kim,et al.  Memory and computation reduction for embedded ASR systems , 2004, INTERSPEECH.