Scalable architecture for word HMM-based speech recognition

This paper presents a scalable architecture for realizing real-time speech recognizers based on a word HMM (hidden Markov model). HMM-based recognition algorithms are classified into two acoustic models, i.e., phenome-level model and word-level model. The phenome-level HMM has been widely used in current speech recognition systems which permit large-sized vocabularies. Whereas the word-level HMM has been constrained to small-sized vocabularies because of extremely high computation cost in spite of excellent recognition performance. In order to overcome the shortage, we adopt the scalable architecture focused on the word HMM structure. The proposed architecture can flexibly improve recognition performance and extend word vocabularies. In addition, the computation time is hardly increasing. In order to demonstrate practical solutions, we have designed and evaluated a total system recognizer including speech analysis and noise robustness on a 0.18 /spl mu/m CMOS standard cell library. The recognition time is 35.7 /spl mu/s/word at 128 MHz operating frequency. The recognizer can achieve over middle-sized vocabularies in real-time response.

[1]  V. Rodellar,et al.  A DSP-based modular architecture for noise cancellation and speech recognition , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[2]  Yasunari Obuchi,et al.  Development of robust speech recognition middleware on microprocessor , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Magne Hallstein Johnsen,et al.  A VLSI implementation of PDF computations in HMM based speech recognition , 1996, Proceedings of Digital Processing Applications (TENCON '96).

[5]  Mark J. F. Gales,et al.  Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[7]  T. Horiyama,et al.  Speech recognition chip for monosyllables , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Koichi Shinoda,et al.  Speech recognition using tree-structured probability density function , 1994, ICSLP.

[10]  Steven F. Quigley,et al.  Implementing a simple continuous speech recognition system on an FPGA , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.