论文信息 - A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the application is simply too performance hungry, and too power sensitive, to stay as a large software application. As a first step in this direction, we describe the design and implementation of a fully functional speech-to-text recognizer on a single Xilinx XUP platform. The design recognizes a 1000 word vocabulary, is speaker-independent, recognizes continuous (connected) speech, and is a "live mode" engine, wherein recognition can start as soon as speech input appears. To the best of our knowledge, this is the most complex recognizer architecture ever fully committed to a hardware-only form. The implementation is extraordinarily small, and achieves the same accuracy as state-of-the-art software recognizers, while running at a fraction of the clock speed.

[1] Scott Mahlke,et al. Insights into the Memory Demands of Speech Recognition Algorithms , 2002 .

[2] Doug Burger,et al. Characterizing the SPHINX Speech Recognition System , 2001 .

[3] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[4] M. Lowy,et al. A dynamic-time-warp integrated circuit for a 1000-word speech recognition system , 1987 .

[5] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[6] Biing-Hwang Juang,et al. Hidden Markov Models for Speech Recognition , 1991 .

[7] Rob A. Rutenbar,et al. Moving speech recognition from software to silicon: the in silico vox project , 2006, INTERSPEECH.

[8] Rob A. Rutenbar,et al. In silico vox: Towards speech recognition in silicon , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).

[9] Colin MacCabe. The Talking Cure , 1981 .

[10] Jan M. Rabaey,et al. Integrated circuits for a real-time large-vocabulary continuous speech recognition system , 1991 .

[11] Scott A. Mahlke,et al. Architectural optimizations for low-power, real-time speech recognition , 2003, CASES '03.

[12] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.

[13] David Pallett,et al. A look at NIST'S benchmark ASR tests: past, present, and future , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[14] John Wawrzynek,et al. BEE2: a high-end reconfigurable computing system , 2005, IEEE Design & Test of Computers.

[15] Zhen Fang,et al. A low-power accelerator for the SPHINX 3 speech recognition system , 2003, CASES '03.

[16] Eric A. Brewer,et al. Hardware speech recognition for user interfaces in low cost, low power devices , 2005, Proceedings. 42nd Design Automation Conference, 2005..