Towards efficient and scalable speech compression schemes for robust speech recognition applications

This paper presents a scheme for distributed automatic speech recognition. A hidden Markov model (HMM)-based speech recognition system with a mel frequency cepstral coefficients (MFCC) front end was used in the evaluation. The goal was to achieve good recognition performance while compressing the MFCC feature vectors. Compression rates and recognition performance for both a digit and an alphabet database are reported. Compared to a scheme of recognizing speech encoded by low bit rate encoders, and previously reported schemes, our method can achieve good recognition performance with bit rates lower than 1 kbps, using low encoding complexity. The encoding algorithms developed are scalable, allowing bit rate and recognition performance trade-offs, and can be combined with unequal error protection or prioritization to allow graceful degradation of performance in the presence of channel errors.

[1]  Philip A. Chou,et al.  Entropy-constrained vector quantization , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  Samuel Bayer Embedding speech in Web interfaces , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Ponani S. Gopalakrishnan,et al.  Compression of acoustic features for speech recognition in network environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Vassilios Digalakis,et al.  Quantization of cepstral parameters for speech recognition over the World Wide Web , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .