Histogram-Based Quantization for Robust and/or Distributed Speech Recognition

In a distributed speech recognition (DSR) framework, the speech features are quantized and compressed at the client and recognized at the server. However, recognition accuracy is degraded by environmental noise at the input, quantization distortion, and transmission errors. In this paper, histogram-based quantization (HQ) is proposed, in which the partition cells for quantization are dynamically defined by the histogram or order statistics of a segment of the most recent past values of the parameter to be quantized. This scheme is shown to be able to solve to a good degree many problems related to DSR. A joint uncertainty decoding (JUD) approach is further developed to consider the uncertainty caused by both environmental noise and quantization errors. A three-stage error concealment (EC) framework is also developed to handle transmission errors. The proposed HQ is shown to be an attractive feature transformation approach for robust speech recognition outside of a DSR environment as well. All the claims have been verified by experiments using the Aurora 2 testing environment, and significant performance improvements for both robust and/or distributed speech recognition over conventional approaches have been achieved.

[1]  Lin-Shan Lee,et al.  Efficient and robust distributed speech recognition (DSR) over wireless fading channels: 2D-DCT compression, iterative bit allocation, short BCH code and interleaving , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jeih-Weih Hung,et al.  Optimization of temporal filters for constructing robust features in speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Lin-Shan Lee,et al.  Joint Uncertainty Decoding (JUD) with Histogram-Based Quantization (HQ) for Robust and/or Distributed Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[5]  Paul Dalsgaard,et al.  A SUBVECTOR-BASED ERROR CONCEALMENT ALGORITHM FOR SPEECH RECOGNITION OVER MOBILE NETWORKS , 2004 .

[6]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[7]  Lin-Shan Lee,et al.  Three-Stage Error Concealment for Distributed Speech Recognition (DSR) with Histogram-Based Quantization (HQ) Under Noisy Environment , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Hermann Ney,et al.  Quantile based histogram equalization for noise robust speech recognition , 2001, INTERSPEECH.

[10]  Hermann Ney,et al.  Histogram based normalization in the acoustic feature space , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[11]  Mari Ostendorf,et al.  Graceful degradation of speech recognition performance over packet-erasure networks , 2002, IEEE Trans. Speech Audio Process..

[12]  Vassilios Digalakis,et al.  Quantization of cepstral parameters for speech recognition over the World Wide Web , 1999, IEEE J. Sel. Areas Commun..

[13]  Imre Kiss,et al.  Robust feature vector compression algorithm for distributed speech recognition , 1999, EUROSPEECH.

[14]  Mark A. Clements,et al.  Extended cluster information vector quantization (ECI-VQ) for robust classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Lin-Shan Lee,et al.  Histogram-based quantization (HQ) for robust and scalable distributed speech recognition , 2005, INTERSPEECH.

[16]  Abeer Alwan,et al.  Low-bitrate distributed speech recognition for packet-based and wireless communication , 2002, IEEE Trans. Speech Audio Process..

[17]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[18]  Carlos Busso,et al.  Modeling, estimating, and compensating low-bit rate coding distortion in speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Carmen García-Mateo,et al.  Soft decoding strategies for distributed speech recognition over IP networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Kuldip K. Paliwal,et al.  Scalable distributed speech recognition using multi-frame GMM-based block quantization , 2004, INTERSPEECH.

[21]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[22]  José L. Pérez-Córdoba,et al.  Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[23]  José L. Pérez-Córdoba,et al.  Efficient MMSE-based channel error mitigation techniques. Application to distributed speech recognition over wireless channels , 2005, IEEE Transactions on Wireless Communications.

[24]  Ben P. Milner,et al.  Robust speech recognition over mobile and IP networks in burst-like packet loss , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Hynek Hermansky TRAP-TANDEM: data-driven extraction of temporal features from speech , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[26]  Hermann Ney,et al.  Quantile based histogram equalization for noise robust large vocabulary speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Abeer Alwan,et al.  An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[28]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Mark A. Clements,et al.  Using observation uncertainty in HMM decoding , 2002, INTERSPEECH.

[30]  Joel Max,et al.  Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[31]  Ramesh A. Gopinath,et al.  Gaussianization , 2000, NIPS.

[32]  Xu Shao,et al.  Low bit-rate feature vector compression using transform coding and non-uniform bit allocation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[33]  Reinhold Häb-Umbach,et al.  Unified probabilistic approach to error concealment for distributed speech recognition , 2005, INTERSPEECH.