Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices

The availability of real-time continuous speech recognition on mobile and embedded devices has opened up a wide range of research opportunities in human-computer interactive applications. Unfortunately, most of the work in this area to date has been confined to proprietary software, or has focused on limited domains with constrained grammars. In this paper, we present a preliminary case study on the porting and optimization of CMU Sphinx-11, a popular open source large vocabulary continuous speech recognition (LVCSR) system, to hand-held devices. The resulting system operates in an average 0.87 times real-time on a 206 MHz device, 8.03 times faster than the baseline system. To our knowledge, this is the first hand-held LVCSR system available under an open-source license

[1]  Douglas L. Jones,et al.  Real-valued fast Fourier transform algorithms , 1987, IEEE Trans. Acoust. Speech Signal Process..

[2]  Maurizio Omologo,et al.  Speaker independent continuous speech recognition using an acoustic-phonetic Italian corpus , 1994, ICSLP.

[3]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[4]  Ivica Rogina,et al.  The bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Roberto Bisiani,et al.  Sub-vector clustering to improve memory and speed performance of acoustic likelihood computation , 1997, EUROSPEECH.

[6]  Mosur Ravishankar Some Results on Search Complexity vs Accuracy , 1997 .

[7]  Kiyohiro Shikano,et al.  Gaussian mixture selection using context-independent HMM , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  J.H.L. Hansen,et al.  Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition , 2001, IEEE Signal Processing Letters.

[9]  Andreas Stolcke,et al.  DynaSpeak: SRI's scalable speech recognizer for embedded and mobile systems , 2002 .

[10]  Tanja Schultz,et al.  Speechalator: Two-Way Speech-to-Speech Translation in Your Hand , 2003, HLT-NAACL.

[11]  Alexander I. Rudnicky,et al.  Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems , 2004, INTERSPEECH.

[12]  Alexander I. Rudnicky,et al.  On improvements to CI-based GMM selection , 2005, INTERSPEECH.

[13]  Sebastian Stüker,et al.  Rapid porting of ASR-systems to mobile devices , 2005, INTERSPEECH.