Memory-efficient buffering method and enhanced reference template for embedded automatic speech recognition system

This work realises a memory-efficient embedded automatic speech recognition (ASR) system on a resource-constrained platform. A buffering method called ultra-low queue-accumulator buffering is presented to efficiently use the constrained memory to extract the linear prediction cepstral coefficient (LPCC) feature in the embedded ASR system. The optimal order of the LPCC is evaluated to balance the recognition accuracy and the computational cost. In the decoding part, the proposed enhanced cross-words reference templates (CWRTs) method is incorporated into the template matching method to reach the speaker-independent characteristic of ASR tasks without the large memory burden of the conventional CWRTs method. The proposed techniques are implemented on a 16-bit microprocessor GPCE063A platform with a 49.152 MHz clock, using a sampling rate of 8 kHz. Experimental results demonstrate that recognition accuracy reaches 95.22% in a 30-sentence speaker-independent embedded ASR task, using only 0.75 kB RAM.

[1]  M. Hariharan,et al.  MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA , 2009, 2009 IEEE Student Conference on Research and Development (SCOReD).

[2]  N. N. Lokhande,et al.  Voice activity detection Algorithm for Speech Recognition Applications , 2012 .

[3]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[4]  Tatsuya Kawahara,et al.  Evaluation of voice activity detection by combining multiple features with weight adaptation , 2006, INTERSPEECH.

[5]  Laurence B. Milstein,et al.  Average SNR of a generalized diversity selection combining scheme , 1999, IEEE Communications Letters.

[6]  Jing Zhang Research of improved DTW algorithm in embedded speech recognition system , 2010, 2010 International Conference on Intelligent Control and Information Processing.

[7]  G. Kouemou,et al.  Hidden Markov models in radar target classification , 2007 .

[8]  Georges Linarès,et al.  Reducing computational and memory cost for cellular phone embedded speech recognition system , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Qinglin Qu,et al.  Realization of embedded speech recognition module based on STM32 , 2011, 2011 11th International Symposium on Communications & Information Technologies (ISCIT).

[10]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[11]  Siddharth Verma,et al.  On design and implementation of an embedded automatic speech recognition system , 2004, 17th International Conference on VLSI Design. Proceedings..

[12]  Baifen Liu Research and implementation of the speech recognition technology based on DSP , 2011, 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC).

[13]  M. Vacher,et al.  Speech and Sound Analysis : an Application of Probabilistic Models , 2007 .

[14]  Waleed H. Abdulla,et al.  Cross-words reference template for DTW-based speech recognition systems , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[15]  Xiaoling Yang,et al.  Comparative Study on Voice Activity Detection Algorithm , 2010, 2010 International Conference on Electrical and Control Engineering.

[16]  Lili Liu,et al.  Research and improvement on embedded system application of DTW-based speech recognition , 2008, 2008 2nd International Conference on Anti-counterfeiting, Security and Identification.

[17]  Jing Zhang,et al.  DTW speech recognition algorithm of optimization template matching , 2012, World Automation Congress 2012.

[18]  Dong Wang,et al.  Embedded speech recognition system on 8-bit MCU core , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.