Feature pruning in likelihood evaluation of HMM-based speech recognition

In this work, we present a simple yet effective technique to reduce the likelihood computation in ASR systems that use continuous density HMM. In a variety of speech recognition tasks, likelihood evaluation accounts for a significant portion of the total computational load. Our proposed method, under certain conditions, only evaluates the component likelihoods of certain features, and approximates those of the remaining (pruned) features by prediction. We investigate two feature clustering approaches associated with our pruning technique. While a simple sequential clustering works remarkably well, a data-driven approach performs even better in its attempt to save computation while maintaining baseline recognition accuracy. With the second approach, we can speed up the likelihood evaluation by 33% and reduce its power consumption by 27% for an isolated word recognition task. For a continuous speech recognition system using either monophone or triphone models, the speedup and power reduction of the likelihood evaluation are 50% and 35% respectively.

[1]  Climent Nadeu Camprubí,et al.  Principal and discriminant component analysis for feature selection in isolated word recognition , 1990 .

[2]  Jeff A. Bilmes,et al.  Graphical models and automatic speech recognition , 2002 .

[3]  Satoshi Takahashi,et al.  Four-level tied-structure for efficient representation of acoustic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hong C. Leung,et al.  PhoneBook: a phonetically-rich isolated-word telephone-speech database , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Mark J. F. Gales,et al.  State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs , 1999, IEEE Trans. Speech Audio Process..

[7]  X. D. Huang,et al.  Semi-continuous hidden Markov models in isolated word recognition , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[8]  Mats Blomberg,et al.  Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system , 1982, ICASSP.

[9]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[10]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[11]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[12]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[13]  Vassilios Digalakis,et al.  Efficient speech recognition using subvector quantization and discrete-mixture HMMS , 2000, Comput. Speech Lang..

[14]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  Hsiao-Wuen Hon,et al.  Allophone clustering for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  Chin-Hui Lee,et al.  Improvements in connected digit recognition using higher order spectral and energy features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Jan Nouza Feature selection methods for hidden Markov model-based speech recognition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[18]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for speaker independent large vocabulary continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Jay G. Wilpon,et al.  Discriminative feature selection for speech recognition , 1993, Comput. Speech Lang..

[20]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[21]  Brian Kan-Wing Mak,et al.  Direct training of subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..