Robust Speech Feature Extraction Using the Hilbert Transform Spectrum Estimation Method

The performance of traditional mel-frequency cepstral coefficients (MFCC) speech feature extraction method decreases drastically in the complex noisy environment. To improve the performance and robustness of speech recognition system, which is based on spectral envelope estimation method, the minimum distortionless response spectrum MVDR-MFCC (Minimum Variance Distortionless Response-MFCC) feature extraction method was proposed. However, the computational complexity of MVDR-MFCC is very high. In this paper, we proposed MHCC (Hilbert-MFCC) feature extraction method for speech, which introduced the Hilbert transform to MFCC process. The experiments, under 8 different noisy environments, indicate that, compared with MVDR-MFCC feature extraction method, the proposed method not only reduces the algorithm’s complexity significantly, but also is less affected by noises, achieving significant improvement in the robustness—the average recognition rate across different noise types and SNRs increases by 12%.

[1]  C. Nadeu,et al.  Evaluation of different feature extraction methods for speech recognition in car environment , 2008, 2008 15th International Conference on Systems, Signals and Image Processing.

[2]  Bhaskar D. Rao,et al.  Robust Feature Extraction for Continuous Speech Recognition Using the MVDR Spectrum Estimation Method , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Dejie Yu,et al.  A gear fault diagnosis using Hilbert spectrum based on MODWPT and a comparison with EMD approach , 2009 .

[4]  Jiao Zhang,et al.  A Method of Bearing Fault Feature Extraction Based on Improved Wavelet Packet and Hilbert Analysis , 2010, J. Digit. Content Technol. its Appl..

[5]  Sanaz Seyedin,et al.  Robust MVDR-based feature extraction for speech recognition , 2009, 2009 7th International Conference on Information, Communications and Signal Processing (ICICS).

[6]  Wei He,et al.  Hilbert-Huang Transform for Nystagmus Analysis in Video-oculography , 2011 .

[7]  Yonghong Yan,et al.  Perceptual MVDR-based cepstral coefficients (PMCCs) for speaker recognition , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[8]  Bhaskar D. Rao,et al.  MVDR based feature extraction for robust speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Li Xin,et al.  Novel Hilbert Energy Spectrum Based Features for Speech Emotion Recognition , 2010, 2010 WASE International Conference on Information Engineering.

[10]  Birger Kollmeier,et al.  Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition , 2011, Speech Commun..

[11]  Liqing Zhang,et al.  Robust Multifactor Speech Feature Extraction Based on Gabor Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  John H. L. Hansen,et al.  A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition , 2008, Speech Commun..

[13]  Matthias Wölfel,et al.  Signal adaptive spectral envelope estimation for robust speech recognition , 2009, Speech Commun..

[14]  Xiaotong Wang,et al.  Generalized Hilbert transform and its properties in 2D LCT domain , 2009, Signal Process..

[15]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[16]  Ren Zhong,et al.  Research on Properties of Hilbert Spectrum , 2007, 2007 8th International Conference on Electronic Measurement and Instruments.

[17]  S. R. Mahadeva Prasanna,et al.  Enhancement of noisy speech by temporal and spectral processing , 2011, Speech Commun..

[18]  M. Savoji,et al.  Perceptual Speech Enhancement Using Hilbert Transform , 2006, 2006 IEEE International Symposium on Industrial Electronics.

[19]  Bhaskar D. Rao,et al.  All-pole modeling of speech based on the minimum variance distortionless response spectrum , 2000, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[20]  M. Feldman Hilbert transform in vibration analysis , 2011 .