论文信息 - Profiling large-vocabulary continuous speech recognition on embedded devices: a hardware resource sensitivity analysis

Profiling large-vocabulary continuous speech recognition on embedded devices: a hardware resource sensitivity analysis

When deployed in embedded systems, speech recognizers are necessarily reduced from large-vocabulary continuous speech recognizers (LVCSR) found on desktops or servers to fit the limited hardware. However, embedded hardware continues to evolve in capability; today’s smartphones are vastly more powerful than their recent ancestors. This begets a new question: which hardware features not currently found on today’s embedded platforms, but potentially add-ons to tomorrow’s devices, are most likely to improve recognition performance? Said differently – what is the sensitivity of the recognizer to fine-grain details of the embedded hardware resources? To answer this question rigorously and quantitatively, we offer results from a detailed study of LVCSR performance as a function of microarchitecture options on an embedded ARM11 and an enterprise-class Intel Core2Duo. We estimate speed and energy consumption, and show, feature by feature, how hardware resources impact recognizer performance.

Rob A. Rutenbar | Kai Yu

[1] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[2] Zhen Fang,et al. A low-power accelerator for the SPHINX 3 speech recognition system , 2003, CASES '03.

[3] Kevin Skadron,et al. HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects , 2003 .

[4] José M. González,et al. Thermal-Effective Clustered Microarchitectures , 2004 .

[5] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.

[6] Scott Mahlke,et al. Insights into the Memory Demands of Speech Recognition Algorithms , 2002 .

[7] Rob A. Rutenbar,et al. Moving speech recognition from software to silicon: the in silico vox project , 2006, INTERSPEECH.

[8] Doug Burger,et al. A characterization of speech recognition on modern computer systems , 2001 .

[9] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10] Bowen Zhou,et al. Recent advances of IBM's handheld speech translation system , 2006, INTERSPEECH.

[11] Richard M. Stern,et al. The 1996 Hub-4 Sphinx-3 System , 1997 .

[12] Alexander I. Rudnicky,et al. Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13] M. Horowitz,et al. Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.