论文信息 - Performing speech recognition on multiple parallel files using continuous hidden Markov models on an FPGA

Performing speech recognition on multiple parallel files using continuous hidden Markov models on an FPGA

Speech recognition is a computationally demanding task, particularly the stages which use Viterbi decoding for converting pre-processed speech data into words or subword unit, and the associated observation probability calculations, which employ multivariate Gaussian distributions; so any device that can reduce the load on, for example, a PC's processor, is advantageous. Hence we present two implementations of a speech recognition system incorporating an FPGA, employing continuous hidden Markov models (HMMs), and capable of processing three speech files simultaneously. The first uses monophones, and can perform recognition 250 times real time (in terms of average time per observation), as well as outperforming its software equivalent. The second uses biphones and triphones, reducing the speedup to 13 times real time.

Steven F. Quigley | Martin J. Russell | Stephen J. Melnikoff | M. Russell | S. Quigley

[1] Steven F. Quigley,et al. Implementing a Hidden Markov Model Speech Recognition System in Programmable Logic , 2001, FPL.

[2] Giuseppe Riccardi,et al. How may I help you? , 1997, Speech Commun..

[3] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4] Steven F. Quigley,et al. Speech Recognition on an FPGA Using Discrete and Continuous Hidden Markov Models , 2002, FPL.

[5] Steve Young,et al. A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..

[6] Steve J. Young,et al. Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Wendy J. Holmes,et al. Speech Synthesis and Recognition , 1988 .

[8] Steve Young,et al. A review of large-vocabulary continuous-speech recognition , 1996 .

[9] Vassilios Digalakis,et al. A Configurable Logic Based Architecture for Real-Time Continuous Speech Recognition Using Hidden Markov Models , 2000, J. VLSI Signal Process..