Real-Time Continuous Speech Recognition System on SH-4A Microprocessor

To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of Julius (embedded Julius). Julius is open source CSR software, and has been used by many researchers and developers in Japan as a standard decoder on PCs. In this paper, we describe an implementation of the embedded Julius on a SH-4A microprocessor. SH-4A is a high-end 32-bit MPU (720 MIPS) with on-chip FPU. However, further computational reduction is necessary for the embedded Julius to operate realtime. Applying some optimizations, the embedded Julius achieves real-time processing on the SH-4A. The experimental results show 0.89 times RT(real-time), resulting 4.0 times faster than baseline CSR. We also evaluated the embedded Julius on large vocabulary (20,000 words). It shows almost real-time processing (1.25 times RT).

[1]  Satoshi Nakamura,et al.  Compression algorithm of trigram language models based on maximum likelihood estimation , 1998, ICSLP.

[2]  Yasunari Obuchi,et al.  Development of robust speech recognition middleware on microprocessor , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Kiyohiro Shikano,et al.  A new phonetic tied-mixture model for efficient decoding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Kiyohiro Shikano,et al.  Embedded Julius: Continuous Speech Recognition Software for Microprocessor , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[5]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Tatsuya Kawahara,et al.  An efficient two-pass search algorithm using word trellis index , 1998, ICSLP.

[7]  Kiyohiro Shikano,et al.  Gaussian mixture selection using context-independent HMM , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Shuichi Itahashi,et al.  The design of the newspaper-based Japanese large vocabulary continuous speech recognition corpus , 1998, ICSLP.

[9]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[10]  Ryosuke Isotani,et al.  Parallel LVCSR Algorithm for Cellphone-Oriented Multicore Processors , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.