Context dependent quantization for distributed and/or robust speech recognition

It is well-known that the high correlation existing in speech signals is very helpful in various speech processing applications. In this paper, we propose a new concept of context-dependent quantization, in which the representative parameter (whether a scalar or a vector) for a quantization partition cell is not fixed, but depends on the signal context on both sides, and the signal context dependencies can be trained with a clean speech corpus or estimated from a noisy speech corpus. This results in a much finer quantization based on local signal characteristics, without using any extra bit rate. This approach is equally applicable to all (scalar or vector) quantization approaches, and can be used either for signal compression in distributed speech recognition (DSR) or for feature transformation in robust speech recognition. In the latter case, each feature parameter is simply transformed into its representative parameter after quantization. In preliminary experiments with AURORA 2 and simulated GPRS channels, this concept is integrated with a recently proposed histogram-based quantization (HQ), the partition cells of which are also dynamic depending on local signal statistics. Significant performance improvements were obtained with the presence of both environmental noise and transmission errors.

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  Lin-Shan Lee,et al.  Joint Uncertainty Decoding (JUD) with Histogram-Based Quantization (HQ) for Robust and/or Distributed Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Lin-Shan Lee,et al.  Histogram-based quantization (HQ) for robust and scalable distributed speech recognition , 2005, INTERSPEECH.

[4]  Abeer Alwan,et al.  An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Hermann Ney,et al.  Histogram based normalization in the acoustic feature space , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[6]  Imre Kiss,et al.  Robust feature vector compression algorithm for distributed speech recognition , 1999, EUROSPEECH.

[7]  Lin-Shan Lee,et al.  Efficient and robust distributed speech recognition (DSR) over wireless fading channels: 2D-DCT compression, iterative bit allocation, short BCH code and interleaving , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  José L. Pérez-Córdoba,et al.  Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[9]  Vassilios Digalakis,et al.  Quantization of cepstral parameters for speech recognition over the World Wide Web , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Lin-Shan Lee,et al.  Three-Stage Error Concealment for Distributed Speech Recognition (DSR) with Histogram-Based Quantization (HQ) Under Noisy Environment , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.