论文信息 - LOW BIT RATE SPEECH COMPRESSION FOR PLAYBACK IN SPEECHRECOGNITION

LOW BIT RATE SPEECH COMPRESSION FOR PLAYBACK IN SPEECHRECOGNITION

In this paper we describe a novel, low complexity, low bit rate speech compression and decompression methods for usage in systems where automatic speech recognition is performed. The coding scheme, referred to as the Recognition Compatible Voice Coder (RECOVC), is based on encoding the mel-frequency cepstral coeecients (MFCC), commonly used in large vocabulary continuous speech recognition systems, and the pitch period. The decoder reproduces natural sounding, good quality, intelligible speech for playback purposes. Implementation of a RECOVC scheme in a speech recognition system may simplify the playback procedure by reconstructing speech from feature vectors already extracted and used for recognition. Reduction in storage space or transmission bandwidth may be achieved in distributed speech recognition systems, by eliminating the need to store or transmit two separate bit streams, one for recognition and the other for playback.

[1] Hwang Soo Lee,et al. LSP weighting functions based on spectral sensitivity and mel-frequency warping for speech recognition in digital communication , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2] Steve Young,et al. A review of large-vocabulary continuous-speech recognition , 1996 .

[3] Stephan Euler,et al. The influence of speech coding algorithms on automatic speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Kuldip K. Paliwal,et al. Effect of speech coders on speech recognition performance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[6] Ponani S. Gopalakrishnan,et al. Compression of acoustic features for speech recognition in network environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7] Eyal Yair,et al. Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[8] Francisco J. Valverde-Albacete,et al. Avoiding distortions due to speech coding and transmission errors in GSM ASR tasks , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9] Wolfgang Hess,et al. Pitch Determination of Speech Signals , 1983 .

[10] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..