Non-linear predictive vector quantization of feature vectors for distributed speech recognition

In this paper, we present a non linear prediction scheme based on a Multi-Layer Perceptron for Predictive Vector Quantization (PVQ-MLP) of MFCC for very low bit-rate coding of acoustic features in distributed speech recognition (DSR). Certain applications like voice enabled web-browsing or speech controlled processes in large industrial plants, where hundreds of users access simultaneously to the same ASR server can benefit from this substantial bit-rate reduction. Experimental results obtained on a large vocabulary task show an improved performance of PVQ-MLP in terms of prediction gain and WER compared to a linear prediction scheme, especially when low bit-rates are evaluated. Using PVQ-MLP the bit-rate can be reduced up to 1.8 kbps resulting in a reduction of 66% with respect to the ETSI standards (4.4 kbps) with a WER degradation lower than 5% compared to a system without quantization.

[1]  Eduardo Lleida,et al.  Differential vector quantization of feature vectors for distributed speech recognition , 2009, INTERSPEECH.

[2]  Antonio Ortega,et al.  Efficient scalable encoding for distributed speech recognition , 2006, Speech Commun..

[3]  Ponani S. Gopalakrishnan,et al.  Compression of acoustic features for speech recognition in network environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Vassilios Digalakis,et al.  Quantization of cepstral parameters for speech recognition over the World Wide Web , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).