Mel-Wiener Filter for Mel-LPC Based Speech Recognition

This paper proposes a Mel-Wiener filter to enhance Mel-LPC spectra in the presence of additive noise. The transfer function of the proposed filter is defined by using a first-order all-pass filter instead of unit delay. The filter coefficients are estimated based on minimization of the sum of the square error on the linear frequency scale without applying the bilinear transformation and efficiently implemented in the autocorrelation domain. The proposed filter does not require any time-frequency conversion, which saves a large amount of computational load. The performance of the proposed system is comparable to that of ETSI AFE. The optimum filter order is found to be 3, and thus filtering is computationally inexpensive. The computational cost of the proposed system except VAD is 53% of ETSI AFE.

[1]  Hiroshi Matsumoto,et al.  An efficient mel-LPC analysis method for speech recognition , 1998, ICSLP.

[2]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[3]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[4]  Alan V. Oppenheim,et al.  Discrete representation of signals , 1972 .

[5]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[6]  Hiroshi Matsumoto,et al.  Evaluation of mel-LPC cepstrum in a large vocabulary continuous speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  H. Strube Linear prediction on a warped frequency scale , 1980 .

[8]  Denis Jouvet,et al.  Evaluation of a noise-robust DSR front-end on Aurora databases , 2002, INTERSPEECH.

[9]  Jinyu Li,et al.  A complexity reduction of ETSI advanced front-end for DSR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[11]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[12]  Nathalie Virag Speech enhancement based on masking properties of the auditory system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[13]  B.-H. Juang,et al.  On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.