Algorithms for data-driven ASR parameter quantization

Abstract There is fast growing research on designing energy-efficient computational devices and applications running on them. As one of the most compelling applications for mobile devices, automatic speech recognition (ASR) requires new methods to allow it to use fewer computational and memory resources while still achieving a high level of accuracy. One way to achieve this is through parameter quantization. In this work, we compare a variety of novel sub-vector clustering procedures for ASR system parameter quantization. Specifically, we look at systematic data-driven sub-vector selection techniques, most of which are based on entropy minimization, and others on recognition accuracy maximization on a development set. We compare performance on two speech databases, phonebook , an isolated word speech recognition task, and timit , a phonetically diverse connected-word speech corpus. While the optimal entropy-minimizing or accuracy-driven quantization methods are intractable, several simple schemes including scalar quantization with separate codebooks per parameter and joint scalar quantization with normalization perform well in their attempt to approximate the optimal clustering.

[1]  Roberto Bisiani,et al.  Sub-vector clustering to improve memory and speed performance of acoustic likelihood computation , 1997, EUROSPEECH.

[2]  Mark C. Johnson,et al.  Software design for low power , 1997 .

[3]  Christophe Beaugeant,et al.  ASR in mobile phones - an industrial approach , 2002, IEEE Trans. Speech Audio Process..

[4]  Robert M. Gray,et al.  High-resolution quantization theory and the vector quantizer advantage , 1989, IEEE Trans. Inf. Theory.

[5]  Wolfgang Nebel,et al.  Low power design in deep submicron electronics , 1997 .

[6]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[7]  Geoffrey Zweig,et al.  Anatomy of an extremely fast LVCSR decoder , 2005, INTERSPEECH.

[8]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[9]  Marcel Vasilache,et al.  Speech recognition using HMMs with quantized parameters , 2000, INTERSPEECH.

[10]  Biing-Hwang Juang,et al.  Multiple stage vector quantization for speech coding , 1982, ICASSP.

[11]  Yunxin Zhao,et al.  An HMM based speaker-independent continuous speech recognition system with experiments on the TIMIT database , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Jorma Laaksonen,et al.  LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[13]  Ramesh A. Gopinath,et al.  The IBM Personal Speech Assistant , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Vassilios Digalakis,et al.  Product-code vector quantization of cepstral parameters for speech recognition over the WWW , 1998, ICSLP.

[15]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[16]  Jeff A. Bilmes,et al.  Hidden-articulator Markov models: performance improvements and robustness to noise , 2000, INTERSPEECH.

[17]  Satoshi Takahashi,et al.  Four-level tied-structure for efficient representation of acoustic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Brian K Mak,et al.  Towards A Compact Speech Recognizer: Subspace Distribution ClusteringHidden Markov Model , 1998 .

[19]  A. Gray,et al.  Distortion performance of vector quantization for LPC voice coding , 1982 .

[20]  Xiao Li,et al.  Custom arithmetic for high-speed, low-resource ASR systems , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Hong C. Leung,et al.  PhoneBook: a phonetically-rich isolated-word telephone-speech database , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[22]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[23]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[24]  Jeff A. Bilmes,et al.  Q-Clustering , 2005, NIPS.

[25]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering for continuous observation density hidden Markov models , 1997, EUROSPEECH.

[26]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[27]  Xiao Li,et al.  Codebook design for ASR systems using custom arithmetic units , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Mark J. F. Gales,et al.  Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[30]  Steven Greenberg,et al.  Improving ASR Performance For Reverberant Speech , 1997 .

[31]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[32]  James R. Glass,et al.  Segment-based recognition on the phonebook task: initial results and observations on duration modeling , 2001, INTERSPEECH.

[33]  Jeff A. Bilmes,et al.  Buried Markov models for speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[34]  Li Deng,et al.  Mipad: a next generation PDA prototype , 2000, INTERSPEECH.