Fixed-Point Arithmetic

There are two main requirements for embedded/mobile systems: one is low power consumption for long battery life and miniaturization, the other is low unit cost for components produced in very large numbers (cell phones, set-top boxes). Both requirements are addressed by CPU’s with integer-only arithmetic units which motivate the fixed-point arithmetic implementation of automatic speech recognition (ASR) algorithms. Large vocabulary continuous speech recognition (LVCSR) can greatly enhance the usability of devices, whose small size and typical on-the-go use hinder more traditional interfaces. The increasing computational power of embedded CPU’s will soon allow real-time LVCSR on portable and lowcost devices. This chapter reviews problems concerning the fixed-point implementation of ASR algorithms and it presents fixed-point methods yielding the same recognition accuracy of the floating-point algorithms. In particular, the chapter illustrates a practical approach to the implementation of the frame-synchronous beam-search Viterbi decoder, N-grams language models, HMM likelihood computation and mel-cepstrum front-end. The fixed-point recognizer is shown to be as accurate as the floating-point recognizer in several LVCSR experiments, on the DARPA Switchboard task, and on an AT&T proprietary task, using different types of acoustic front-ends, HMM’s and language models. Experiments on the DARPA Resource Management task, using the StrongARM-1100 206 MHz and the XScale PXA270 624 MHz CPU’s show that the fixed-point implementation enables real-time performance: the floating point recognizer, with floating-point software emulation is several times slower for the same accuracy.

[1]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  Hermann Ney,et al.  Using SIMD instructions for fast likelihood calculation in LVCSR , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  O. Viikki,et al.  ASR in portable wireless devices , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[5]  Jeongsu Kim,et al.  Memory and computation reduction for embedded ASR systems , 2004, INTERSPEECH.

[6]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Miroslav Novak,et al.  Two-pass search strategy for large list recognition on embedded speech recognition platforms , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Marcel Vasilache,et al.  On a practical design of a low complexity speech recognition engine , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[10]  Sebastian Stüker,et al.  Rapid porting of ASR-systems to mobile devices , 2005, INTERSPEECH.

[11]  Marcel Vasilache,et al.  Speech recognition using HMMs with quantized parameters , 2000, INTERSPEECH.

[12]  Aaron E. Rosenberg,et al.  On the implementation of ASR algorithms for hand-held wireless mobile devices , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  Imre Kiss,et al.  Comparison of low footprint acoustic modeling techniques for embedded ASR systems , 2005, INTERSPEECH.

[14]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[15]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[16]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[17]  Miroslav Novak,et al.  Towards large vocabulary ASR on embedded platforms , 2004, INTERSPEECH.

[18]  Satoshi Takahashi,et al.  On the use of scalar quantization for fast HMM computation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Xiao Li,et al.  A high-speed, low-resource ASR back-end based on custom arithmetic , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Yifan Gong,et al.  Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[21]  Dmitry Zaykovskiy,et al.  Survey of the Speech Recognition Techniques for Mobile Devices , 2006 .

[22]  Yu-Hung Kao,et al.  A low cost dynamic vocabulary speech recognizer on a GPP-DSP system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[23]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.