A Novel Robust MFCC Extraction Method Using Sample-ISOMAP for Speech Recognition

According to the nonlinear characteristic of the speech signal, this paper presents a novel robust MFCC extraction method using sample-ISOMAP. ISOMAP is a nonlinear dimensionality reduction method based on the theory of manifold, it can reveal the meaningful low-dimensional structure hidden in the high-dimensional observations. In the proposed method, ISOMAP is first applied for calculating the non-linear mapping matrix which comes from the consistency mixed matrix. The consistency mixed matrix is composed of the logarithm of Mel filter bank energies derived from the sample data. Then the non-linear mapping matrix is used to replace the DCT procedure in the classic MFCC method. Experiments based on the recognition system established by HTK3.3 and Aurora2.0 speech database show that the robustness of the proposed method is superior to the PCA-MFCC and MFCC methods, and the recognition rate has been notably raised under low SNRs.

[1]  Bhaskar D. Rao,et al.  MVDR based feature extraction for robust speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Goutam Saha,et al.  Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition , 2012, Speech Commun..

[3]  Ahmad Akbari,et al.  Genetic programming based optimization of class-dependent PCA for extracting robust MFCC , 2008, INTERSPEECH.

[4]  Ralf Schlüter,et al.  Non-stationary feature extraction for automatic speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Zdravko Kacic,et al.  A noise robust feature extraction algorithm using joint wavelet packet subband decomposition and AR modeling of speech signals , 2007, Signal Process..

[6]  Mayank Dave,et al.  Filterbank optimization for robust ASR using GA and PSO , 2012, Int. J. Speech Technol..

[7]  A. Akbari,et al.  Optimized linear discriminant analysis for extracting robust speech features , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[8]  Ahmad Akbari,et al.  SNR-dependent compression of enhanced Mel sub-band energies for compensation of noise effects on MFCC features , 2007, Pattern Recognit. Lett..

[9]  Bing Lu,et al.  Noise-robust acoustic signature recognition using nonlinear Hebbian learning , 2010, Neural Networks.

[10]  Qingsheng Zhu,et al.  An Automatic and Adaptive Multi-manifolds Learning Algorithm , 2011 .

[11]  Cheng Xu,et al.  An Improving MFCC Features Extraction Based on FastICA Algorithm plus RASTA Filtering , 2011, J. Comput..

[12]  Dorra Ben Ayed,et al.  Speaker Identification System Using Linear and Non Linear Kullback-Leibler Divergence Kernels , 2011 .

[13]  Paul L. Rosin,et al.  Selection of the optimal parameter value for the Isomap algorithm , 2006, Pattern Recognit. Lett..

[14]  John H. L. Hansen,et al.  A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition , 2008, Speech Commun..

[15]  Kuldip K. Paliwal,et al.  Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition , 2006, Speech Commun..

[16]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[17]  Antonio M. Peinado,et al.  Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.