Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition

This paper presents a statistical model-based noise suppression approach for voice recognition in a car environment. In order to alleviate the spectral whitening and signal distortion problem in the traditional decisiondirected Wiener filter, we combine a decision-directed method with an original spectrum reconstruction method and develop a new two-stage noise reduction filter estimation scheme. When a tradeoff between the performance and computational efficiency under resource-constrained automotive devices is considered, ETSI standard advance distributed speech recognition font-end (ETSI-AFE) can be an effective solution, and ETSI-AFE is also based on the decision-directed Wiener filter. Thus, a series of voice recognition and computational complexity tests are conducted by comparing the proposed approach with ETSI-AFE. The experimental results show that the proposed approach is superior to the conventional method in terms of speech recognition accuracy, while the computational cost and frame latency are significantly reduced.

[1]  Jasha Droppo,et al.  A noise-robust ASR front-end using Wiener filter constructed from MMSE estimation of clean speech and noise , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Ho-Young Jung,et al.  Discriminative noise adaptive training approach for an environment migration , 2007, INTERSPEECH.

[3]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[4]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[5]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[6]  Wen-Rong Wu,et al.  Subband Kalman filtering for speech enhancement , 1998 .

[7]  Ho-Young Jung,et al.  Model Adaptation Using Discriminative Noise Adaptive Training Approach for New Environments , 2008 .

[8]  Hyung Soon Kim,et al.  Narrowband to wideband conversion of speech using GMM based transformation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Anshu Agarwal,et al.  TWO-STAGE MEL-WARPED WIENER FILTER FOR ROBUST SPEECH RECOGNITION , 1999 .

[10]  Jerry D. Gibson,et al.  Filtering of colored noise for speech enhancement and coding , 1991, IEEE Trans. Signal Process..

[11]  Denis Jouvet,et al.  Evaluation of a noise-robust DSR front-end on Aurora databases , 2002, INTERSPEECH.

[12]  Speech Processing , Transmission and Quality Aspects ( STQ ) ; Test Methodologies for ETSI Test Events and Results ; Part 2 : 1 st ETSI Plugtests Speech Quality Test Event Report , 2022 .

[13]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[14]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  David Pearce,et al.  A robust front-end algorithm for distributed speech recognition , 2001, INTERSPEECH.

[16]  Young-Joo Suh,et al.  Feature Compensation Combining SNR-Dependent Feature Reconstruction and Class Histogram Equalization , 2008 .

[17]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[18]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[19]  Masanori Tsujikawa,et al.  Model-Basedwiener Filter for Noise Robust Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[21]  Ho-Young Jung,et al.  A Commercial Car Navigation System using Korean Large Vocabulary Automatic Speech Recognizer , 2009 .

[22]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[23]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .