Exploiting principal component analysis in modulation spectrum enhancement for robust speech recognition

In this paper, we present a novel method to improve the noise robustness of speech features based on principal component analysis (PCA). The PCA process is employed to extract a set of basis spectral vectors for the modulation spectra of clean training speech features. The new modulation spectra of the speech features, constructed by mapping the original modulation spectra into the space spanned by these PCA-derived basis vectors, have shown robustness against the noise distortion. The experiments conducted on the Aurora-2 digit string database revealed that the proposed PCA-based approach, together with mean and variance normalization (MVN), can provide average error reduction rates of over 65% and 12% relative as compared with the baseline MFCC system and that using the MVN method alone, respectively.

[1]  Li Deng,et al.  Evaluation of SPLICE on the Aurora 2 and 3 tasks , 2002, INTERSPEECH.

[2]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[3]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Jeih-Weih Hung,et al.  Constructing Modulation Frequency Domain-Based Features for Robust Speech Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[7]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[8]  Jeih-Weih Hung,et al.  Optimization of temporal filters for constructing robust features in speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Misha Pavel,et al.  On the importance of various modulation frequencies for speech recognition , 1997, EUROSPEECH.

[10]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[11]  Reinhold Häb-Umbach Investigations on inter-speaker variability in the feature space , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[13]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Hermann Ney,et al.  Quantile based histogram equalization for noise robust speech recognition , 2001, INTERSPEECH.