论文信息 - Compression of acoustic features - are perceptual quality and recognition performance incompatible goals?

Compression of acoustic features - are perceptual quality and recognition performance incompatible goals?

The client-server model is being advocated for speech recognition over networks, where the acoustic features are calculated by the client, compressed and transmitted to the server. This has provoked a number of papers claiming that as recognition accuracy and perceptual quality are different goals, a new compression approach is needed. This is verified by experiments in which codecs such as CELP are shown to produce degraded recognition performance, but that direct quantization of acoustic features at data rates as low as 4kbps gives little or no degradation. In this paper we show that the goals are not incompatible, and that a very low bit-rate codec can be used to perform the compression. We also show that if the ability to reproduce the speech is really not needed, a bit rate as low as 625 bit/sec can be achieved by computing and compressing posterior phone probabilities.

Roger C. F. Tucker | Tony Robinson | James Christie

[1] Joseph P. Campbell,et al. The Dod 4.8 Kbps Standard (Proposed Federal Standard 1016) , 1991 .

[2] R.C.F. Tucker. Low bit-rate frequency extension coding , 1998 .

[3] Vassilios Digalakis,et al. Product-code vector quantization of cepstral parameters for speech recognition over the WWW , 1998, ICSLP.

[4] Steve Renals,et al. THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION , 1996 .

[5] Ponani S. Gopalakrishnan,et al. Compression of acoustic features for speech recognition in network environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6] Stephan Euler,et al. The influence of speech coding algorithms on automatic speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Kuldip K. Paliwal,et al. Effect of Speech Coders on Speech Recognition Performance , 1996, Fourth International Symposium on Signal Processing and Its Applications.

[8] Thomas P. Barnwell,et al. MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .