Integration of Phoneme-Subspaces Using ICA for Speech Feature Extraction and Recognition

In our previous work, the use of PCA instead of DCT shows robustness in distorted speech recognition because the main speech element is projected onto low-order features, while the noise or distortion element is projected onto high-order features [1]. This paper introduces a new feature extraction technique that collects the correlation information among phoneme subspaces and their elements are statistically mutual independent. The proposed speech feature vector is generated by projecting observed vector onto integrated space obtained by PCA and ICA. The performance evaluation shows that the proposed method provides a higher isolated word recognition accuracy than conventional methods in some reverberant conditions.

[1]  Tetsuya Takiguchi,et al.  Robust Feature Extraction using Kernel PCA , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Hynek Hermansky,et al.  Multiresolution channel normalization for ASR in reverberant environments , 1997, EUROSPEECH.

[3]  Lin-Shan Lee,et al.  Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[4]  Kazuya Takeda,et al.  Two-stage noise spectra estimation and regression based in-car speech recognition using single distant microphone , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Jean-Marc Vesin,et al.  Single channel speech enhancement using principal component analysis and MDL subspace selection , 1999, EUROSPEECH.

[6]  Hugo Van hamme,et al.  A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition , 2007, EURASIP J. Adv. Signal Process..

[7]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[8]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[9]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[10]  Oh-Wook Kwon,et al.  Phoneme recognition using ICA-based feature extraction and transformation , 2004, Signal Process..

[11]  Tomohiro Nakatani,et al.  Efficient blind dereverberation framework for automatic speech recognition , 2005, INTERSPEECH.

[12]  Jen-Tzung Chien,et al.  Factor analysis of acoustic features for streamed hidden Markov modeling , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).