Noise Aware Sub-band Locality Preserving Projection for Robust Speech Recognition

Recovering the nonlinear low dimensional embedding for the speech signals in the clean environment using the manifold learning techniques has become of substantial interest recently. However, the issue of manifold learning for feature transformation in domains involving noise corrupted speech can be quite different. We tackle this issue by presenting a new approach for reducing noise effect on different Mel Frequency Cepstral Coefficients (MFCCs) and so Mel sub-bands. We introduce our method in the framework of Locality Preserving Projection (LPP) as a manifold learning technique where we construct the manifold on each Mel sub-band by considering noise effects on it. We propose to learn one manifold for each MFCC and so Mel sub-band using noisy speech, and we name this method as sub-band LPP. The experimental results on AURORA-2 database show that the noise aware sub-band LPP improves the noisy speech recognition rate in comparison to conventional LPP for SNR values greater than 0 dB.

[1]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  Naoya Wada,et al.  Cepstral gain normalization for noise robust speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[5]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[6]  Aren Jansen,et al.  Intrinsic Fourier Analysis on the Manifold of Speech Sounds , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[8]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[9]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[10]  Richard C. Rose,et al.  Noise aware manifold learning for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Aren Jansen,et al.  Intrinsic Spectral Analysis , 2013, IEEE Transactions on Signal Processing.

[12]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[13]  H. Deutsch Principle Component Analysis , 2004 .

[14]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.