A New Subband-Weighted MVDR-Based Front-End for Robust Speech Recognition

This paper presents a novel noise-robust feature extraction method for speech recognition. It is based on making the Minimum Variance Distortionless Response (MVDR) power spectrum estimation method robust against noise. This robustness is obtained by modifying the distortionless constraint of the MVDR spectral estimation method via weighting the sub-band power spectrum values based on the sub-band signal to noise ratios. The optimum weighting is obtained by employing the experimental findings of psychoacoustics. According to our experiments, this technique is successful in modifying the power spectrum of speech signals and making it robust against noise. The above method, when evaluated on Aurora 2 task for recognition purposes, outperformed both the MFCC features as the baseline and the MVDR-based features in different noisy conditions.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  Bhaskar D. Rao,et al.  Robust Feature Extraction for Continuous Speech Recognition Using the MVDR Spectrum Estimation Method , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Hanseok Ko,et al.  A novel spectral subtraction scheme for robust speech recognition: spectral subtraction using spectral harmonics of speech , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[5]  Peng Li,et al.  A Novel Noise Robust Front-End Using First Order VTS in Construction of Mel-Warped Wiener Filter , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[7]  Tet Hin Yeap,et al.  Speech Feature Estimation Under the Presence of Noise with a Switching Linear Dynamic Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Sven Nordholm,et al.  Evaluation and Modification of Cepstral Moment Normalization for Speech Recognition in Additive Babble Ensemble , 2006 .

[9]  S.M. Ahadi,et al.  Weighting of Mel Sub-bands Based on SNR/Entropy for Robust ASR , 2008, 2008 IEEE International Symposium on Signal Processing and Information Technology.

[10]  Satoshi Nakamura,et al.  Cepstrum derived from differentiated power spectrum for robust speech recognition , 2003, Speech Commun..

[11]  Kuang Jingming,et al.  NOISE SUPPRESSION BASED ON TEAGER ENERGY OPERATOR FOR IMPROVING THE ROBUSTNESS OF ASR FRONT-END , 2003 .

[12]  W. M. Carey,et al.  Digital spectral analysis: with applications , 1986 .

[13]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[14]  Kuldip K. Paliwal,et al.  Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition , 2006, Speech Commun..

[15]  S. Seyedin,et al.  Feature extraction based on DCT and MVDR spectral estimation for robust speech recognition , 2008, 2008 9th International Conference on Signal Processing.

[16]  Shu Hung Leung,et al.  SNR-dependent non-uniform spectral compression for noisy speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Berlin Chen,et al.  Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition , 2006, INTERSPEECH.

[18]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition , 1996 .

[19]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.