An efficient low bit-rate compression scheme of acoustic features for distributed speech recognition

A low bit-rate source coding scheme for distributed speech recognition (DSR) systems is proposed.The algorithm is based on weighted least squares (W-LS) polynomial approximation.The efficiency of the algorithm is tested with the noisy Aurora-2 database, for bit-rates ranging from 1400 bps to 1925 bps.The obtained results generally outperform the ETSI-AFE encoder for clean training and provide similar performance, at 1925 bps, for multi-condition training. Due to the limited network bandwidth, a noise robust low bit-rate compression scheme of Mel frequency cepstral coefficients (MFCCs) is desired for distributed speech recognition (DSR) services. In this paper, we present an efficient MFCCs compression method based on weighted least squares (W-LS) polynomial approximation through the exploitation of the high correlation across consecutive MFCC frames. Polynomial coefficients are quantized by designing a tree structured vector quantization (TSVQ) based scheme. Recognition experiments are conducted on the noisy Aurora-2 database, under both clean and multi-condition training modes. The results show that the proposed W-LS encoder slightly exceeds the ETSI advanced front-end (ETSI-AFE) baseline system for bit-rates ranging from 1400 bps to 1925 bps under clean training mode. However, a negligible degradation is observed in case of multi-condition training mode (around 0.6% and 0.2% at 1400 bps and 1925 bps, respectively). Furthermore, the obtained performance generally outperforms the ETSI-AFE source encoder at 4400 bps under clean training and provides similar performance, at 1925 bps, under multi-condition training. Display Omitted

[1]  Paul Dalsgaard,et al.  Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End , 2005, 2005 IEEE 7th Workshop on Multimedia Signal Processing.

[2]  Paul Dalsgaard,et al.  Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Abeer Alwan,et al.  A packetization and variable bitrate interframe compression scheme for vector quantizer-based distributed speech recognition , 2007, INTERSPEECH.

[4]  Antonio Ortega,et al.  Efficient scalable encoding for distributed speech recognition , 2006, Speech Commun..

[5]  Abeer Alwan,et al.  An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Imre Kiss,et al.  Robust feature vector compression algorithm for distributed speech recognition , 1999, EUROSPEECH.

[7]  Robert M. Gray,et al.  Vector Quantizers and Predictive Quantizers for Gauss-Markov Sources , 1982, IEEE Trans. Commun..

[8]  Zheng-Hua Tan,et al.  A posteriori SNR weighted energy based variable frame rate analysis for speech recognition , 2008, INTERSPEECH.

[9]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[10]  Antonio M. Peinado Speech Recognition Over Digital Channels: Robustness and Standards , 2006 .

[11]  Kuldip K. Paliwal,et al.  Multi-Frame GMM-Based Block Quantisation for Distributed Speech Recognition Under Noisy Conditions , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Steve Young,et al.  The HTK book , 1995 .

[13]  Abeer Alwan,et al.  On the use of variable frame rate analysis in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[14]  Mohamed Debyeche,et al.  A polynomial interpolation-based scheme for reducing bandwidth in distributed speech recognition system , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[15]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[16]  Kuldip K. Paliwal,et al.  Scalable distributed speech recognition using Gaussian mixture model-based block quantisation , 2006, Speech Commun..

[17]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[18]  Kuldip K. Paliwal,et al.  Quantization of Speech Features: Source Coding , 2008 .

[19]  Gene H. Golub,et al.  Matrix computations , 1983 .

[20]  Eduardo Lleida,et al.  Low bit rate compression methods of feature vectors for distributed speech recognition , 2014, Speech Commun..

[21]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[22]  K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1990 .

[23]  Antonio Ortega,et al.  Enhanced standard compliant distributed speech recognition (Aurora encoder) using rate allocation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Edward Jones,et al.  Reducing bandwidth for robust distributed speech recognition in conditions of packet loss , 2012, Speech Commun..

[25]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.