Optimization of ETSI DSR frontend software on a high-efficient audio DSP

Server-terminal based distributed speech recognition (DSR) applications are widely adopted on mobile devices. In this paper, we have implemented a power-efficient DSR solution of high performance for real-time speech processing. The DSR frontend algorithms are elaborately optimized in assembly codes utilizing accelerating technics provided by a previously released audio DSP, such as binary scaling operations in a deep instruction pipeline, automatic memory addressing method, and parallel processing of packaged data. The performance of DSR frontend software running on the DSP is greatly improved, and our work is of best efficiency compared with former solutions. The realtime frequency of processing 16 kHz input streams is 124.3 MHz and is only about 30% of what is required on a TI C64x DSP. Based on simulation experiment under SMIC 130 nm process, the power consumed for DSR frontend processing is 23 mW. Besides, the presented implementation of the algorithms is also integrated in a server-terminal demo system, and is proved to be worked well in real speech recognition applications.

[1]  Tajana Simunic,et al.  A low-power, fixed-point, front-end feature extraction for a distributed speech recognition system , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Enrico Mach,et al.  FOR LOW POWER , 1997 .

[3]  Kun Yang,et al.  StreamPoP: Stream programming oriented power-efficient audio DSP , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[4]  Paul Dalsgaard,et al.  A Configurable Distributed Speech Recognition System , 2007 .

[5]  Shuvra S. Bhattacharyya,et al.  Design and optimization of a distributed, embedded speech recognition system , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[6]  Alexander Schmitt,et al.  Java vs. Symbian: A comparison of software-based DSR implementations on mobile phones , 2008 .

[7]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .