Robust speech recognition by using spectral subtraction with noise peak shifting

In this study, a novel technique that recovers the temporal structure of speech power spectrum is proposed. The histogram of average speech log power spectrum shows that the contamination of noise leads to the shift of noise peak, which in return degrades the performance of speech recognition systems. A two-step scheme is proposed to weaken the noise effects by first reducing the noise variance and then shifting the noise mean. The proposed algorithm consists of two parts, two-dimensional smoothing and controlled noise subtraction, which leads to the name SNS. The proposed algorithm manages to solve the speech probability distribution function discontinuity problem caused by traditional spectral subtraction series algorithms. In contrast to the clean speech estimation methods, the proposed algorithm does not need a prior speech/noise statistical model, which makes it simple but effective. The effectiveness of the proposed filter is tested using the AURORA2 database. Very promising results are obtained, 88.59% for noisy speech (average from signal-to-noise ratio 0-20 dB). Comparison is made against eight state-of-the-art speech recognition algorithms. Overall the proposed algorithm produces significant improvements over the comparison targets.

[1]  Jacob Benesty,et al.  New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .

[3]  Li Deng,et al.  Evaluation of the SPLICE algorithm on the Aurora2 database , 2001, INTERSPEECH.

[4]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Li Deng,et al.  Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features , 2004, IEEE Transactions on Speech and Audio Processing.

[6]  Daniel P. W. Ellis,et al.  Speech and Audio Signal Processing - Processing and Perception of Speech and Music, Second Edition , 1999 .

[7]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[8]  Paul Dalsgaard,et al.  Spectral subtraction with full-wave rectification and likelihood controlled instantaneous noise estimation for robust speech recognition , 2004, INTERSPEECH.

[9]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[10]  Haizhou Li,et al.  Normalization of the Speech Modulation Spectra for Robust Speech Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[12]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[13]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[14]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[15]  S. R. Mahadeva Prasanna,et al.  Multiple frame size and rate analysis for speaker recognition under limited data condition , 2009 .

[16]  Joon-Hyuk Chang,et al.  Statistical model-based voice activity detection using support vector machine , 2009 .