Efficient codebooks for fast and accurate low resource ASR systems

Today, speech interfaces have become widely employed in mobile devices, thus recognition speed and resource consumption are becoming new metrics of Automatic Speech Recognition (ASR) performance. For ASR systems using continuous Hidden Markov Models (HMMs), the computation of the state likelihood is one of the most time consuming parts. In this paper, we propose novel multi-level Gaussian selection techniques to reduce the cost of state likelihood computation. These methods are based on original and efficient codebooks. The proposed algorithms are evaluated within the framework of a large vocabulary continuous speech recognition task.

[1]  Kiyohiro Shikano,et al.  Gaussian mixture selection using context-independent HMM , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Alexander I. Rudnicky,et al.  On improvements to CI-based GMM selection , 2005, INTERSPEECH.

[3]  Alexander I. Rudnicky,et al.  Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems , 2004, INTERSPEECH.

[4]  Vassilios Digalakis,et al.  Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..

[5]  Mark J. F. Gales,et al.  Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Hermann Ney,et al.  Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition , 1997, EUROSPEECH.

[7]  Andreas Stolcke,et al.  Improved modeling and efficiency for automatic transcription of Broadcast News , 2002, Speech Commun..

[8]  Roberto Bisiani,et al.  Sub-vector clustering to improve memory and speed performance of acoustic likelihood computation , 1997, EUROSPEECH.

[9]  Ivica Rogina,et al.  The bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Kiyohiro Shikano,et al.  A new phonetic tied-mixture model for efficient decoding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  Ananth Sankar,et al.  Parameter tying and gaussian clustering for faster, better, and smaller speech recognition , 1999, EUROSPEECH.

[12]  Chafic Mokbel,et al.  Online adaptation of HMMs to real-life conditions: a unified framework , 2001, IEEE Trans. Speech Audio Process..

[13]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[14]  Shawn M. Herman,et al.  Joint MCE estimation of VQ and HMM parameters for Gaussian mixture selection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  A. Aiyer,et al.  Rapid likelihood calculation of subspace clustered Gaussian components , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  Lalit R. Bahl,et al.  Partitioning the feature space of a classifier with linear hyperplanes , 1999, IEEE Trans. Speech Audio Process..

[17]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Imre Kiss,et al.  Gaussian Selection with Non-Overlapping Clusters for ASR in Embedded Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19]  Monika Woszczyna,et al.  Fast speaker independent large vocabulary continuous speech recognition , 1998 .

[20]  J.H.L. Hansen,et al.  Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition , 2001, IEEE Signal Processing Letters.

[21]  Michael Picheny,et al.  Decision-tree based feature-space quantization for fast Gaussian computation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[22]  Alexander H. Waibel,et al.  Speeding up the score computation of HMM speech regognizers with the bucket voronoi intersection algorithm , 1995, EUROSPEECH.

[23]  Mark J. F. Gales,et al.  State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs , 1999, IEEE Trans. Speech Audio Process..

[24]  Karim Filali,et al.  Data-driven vector clustering for low-memory footprint ASR , 2002, INTERSPEECH.

[25]  Satoshi Takahashi,et al.  Four-level tied-structure for efficient representation of acoustic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[26]  Vassilios Digalakis,et al.  Efficient speech recognition using subvector quantization and discrete-mixture HMMS , 2000, Comput. Speech Lang..

[27]  Satoshi Takahashi,et al.  On the use of scalar quantization for fast HMM computation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[28]  Xiao Li,et al.  A high-speed, low-resource ASR back-end based on custom arithmetic , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..