Deconvolution of telephone line effects for speech recognition

Abstract This paper presents a new approach to equalize the telephone line effects in the transmitted signal aiming at improving the performance of speech recognition systems. This new approach implements a blind equalization scheme where an adaptive filter, using some known statistics about the signals, deconvolves the channel from the transmitted signal. Measurements carried out on actual telephone data confirm that telephone lines introduce disturbing convolved components in speech signals. Line effects are almost constant for a given call but vary with the calls. The proposed adaptive filtering of the telephone line effects is compared to two conventional techniques, namely subtraction of the long-term cepstrum and highpass filtering of cepstral trajectories. Recognition experiments are carried out on several telephone databases in a speaker-independent mode. The results show that reducing the channel effects significantly improves the recognition performance. Regarding the obtained error rates, the proposed adaptive filter yields better performance than the conventional highpass filters. However, this adaptive filtering is not as good as the off-line cepstral subtraction technique where the long-term cepstrum is estimated on several recordings of a call. Experiments were also conducted to measure the amount of speech data necessary to obtain a reliable estimate of channel effects. Averaging cepstra vectors on a few seconds of speech produces a reliable estimate of the constant convolved perturbation.

[1]  Yunxin Zhao Self-learning speaker adaptation based on spectral variation source decomposition , 1993, EUROSPEECH.

[2]  J. Shynk Frequency-domain and multirate adaptive filtering , 1992, IEEE Signal Processing Magazine.

[3]  Abdulmesih Aktas,et al.  Online channel compensation for robust speech recognition , 1993, EUROSPEECH.

[4]  Hynek Hermansky,et al.  Recognition of speech in additive and convolutional noise based on RASTA spectral processing , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .

[6]  Ted H. Applebaum,et al.  Subband or cepstral domain filtering for recognition of Lombard and channel-distorted speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Katarina Bartkova,et al.  On the modelization of allophones in an HMM based speech recognition system , 1991, EUROSPEECH.

[8]  Chafic Mokbel,et al.  Word recognition in the car: adapting recognizers to new environments , 1992, ICSLP.

[9]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Hynek Hermansky,et al.  Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP) , 1991, EUROSPEECH.

[11]  D. Mansor The short-time modified coherence representation and noisy speech recognition , 1989 .

[12]  Biing-Hwang Juang,et al.  The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[13]  Hans-Günter Hirsch,et al.  Improved speech recognition using high-pass filtering of subband envelopes , 1991, EUROSPEECH.

[14]  C.E. Mokbel,et al.  Automatic word recognition in cars , 1995, IEEE Trans. Speech Audio Process..

[15]  I. H. Öğüş,et al.  NATO ASI Series , 1997 .

[16]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[17]  Chafic Mokbel,et al.  Compensation of telephone line effects for robust speech recognition , 1994, ICSLP.

[18]  Chafic Mokbel,et al.  On-line adaptation of a speech recognizer to variations in telephone line conditions , 1993, EUROSPEECH.