Robust Feature Extraction Based on Teager-Entropy and Half Power Spectrum Estimation for Speech Recognition

In this paper, we present a robust feature extraction scheme for speech recognition. Compared to standard mel-frequency cepstral coefficients (MFCC), it incorporates perceptual information into half parameter spectrum not into the whole classical spectrum, and combines with Teager-Entropy to construct a new feature vector. Its performance is compared with several techniques, and detailed comparative performance analysis with various types of noise and a wide range of SNR values is presented. The results suggest that our feature achieves superior robustness with HMM-based recognizer on an English digit task. The 8.87 % reduction of average error rate is obtained in comparison to ordinary MFCC. Furthermore, the results also uncover that the half power spectrum-based method leads to superior performance over the whole power spectrum-based method in most given environment.

[1]  Tharam S. Dillon,et al.  Enhancement of Speech Recognitions for Control Automation Using an Intelligent Particle Swarm Optimization , 2012, IEEE Transactions on Industrial Informatics.

[2]  Tuomas Virtanen,et al.  Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Bhaskar D. Rao,et al.  Robust Feature Extraction for Continuous Speech Recognition Using the MVDR Spectrum Estimation Method , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Haizhou Li,et al.  An analysis of vector Taylor series model compensation for non-stationary noise in speech recognition , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[5]  John H. L. Hansen,et al.  An advanced feature compensation method employing acoustic model with phonetically constrained structure , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Hadi Veisi,et al.  The effect of phase information in speech enhancement and speech recognition , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[7]  Zoran A. Ivanovski,et al.  Kernel Power Flow Orientation Coefficients for Noise-Robust Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  J. Rajnoha,et al.  Modified Feature Extraction Methods in Robust Speech Recognition , 2007, 2007 17th International Conference Radioelektronika.

[9]  Biing-Hwang Juang,et al.  Nonlinear Compensation Using the Gauss–Newton Method for Noise-Robust Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Douglas D. O'Shaughnessy,et al.  Robust feature extractors for continuous speech recognition , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[11]  S. Ravi,et al.  Speech enhancement models suited for speech recognition using composite source and wavelet decomposition model , 2010, 2010 International Conference on Signal and Image Processing.

[12]  Sushila Maheshkar,et al.  Study of robust feature extraction techniques for speech recognition system , 2015, 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE).

[13]  Chip-Hong Chang,et al.  Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[14]  Reinhold Häb-Umbach,et al.  A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Dimitra Vergyri,et al.  Medium-duration modulation cepstral feature for robust speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Kuldip K. Paliwal,et al.  Effect of Speech and Noise Cross Correlation on AMFCC Speech Recognition Features , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Ben P. Milner,et al.  Robust Acoustic Speech Feature Prediction From Noisy Mel-Frequency Cepstral Coefficients , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Chin-Hui Lee,et al.  An integrated approach to feature compensation combining particle filters and hidden Markov models for robust speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Scott C. Douglas,et al.  A Spatio–Temporal Speech Enhancement Technique Based on Generalized Eigenvalue Decomposition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Wei-Tyng Hong HCRF-based model compensation for noisy speech recognition , 2013, 2013 IEEE International Symposium on Consumer Electronics (ISCE).

[21]  Jun Du,et al.  A Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Model for Noisy Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.