Phoneme recognition using ICA-based feature extraction and transformation

We investigate the use of independent component analysis (ICA) for speech feature extraction in speech recognition systems. Although initial research suggested that learning basis functions by ICA for encoding the speech signal in an efficient manner improved recognition accuracy, we observe that this may be true for a recognition tasks with little training data. However, when compared in a large training database to standard speech recognition features such as the mel frequency cepstral coefficients (MFCCs), the ICA-adapted basis functions perform poorly. This is mainly due to the resulting phase sensitivity of the learned speech basis functions and their time shift variance property. In contrast to image processing, phase information is not essential for speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The performance of the new feature is evaluated for phoneme recognition using the TIMIT speech database and compared with the standard MFCC feature. The phoneme recognition results show promising accuracy, which is comparable to the well-optimized MFCC features.

[1]  Lucas C. Parra,et al.  Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals , 2000, NIPS.

[2]  Nikos Fakotakis,et al.  Independent component analysis applied to feature extraction for robust automatic speech recognition , 2000 .

[3]  Aapo Hyvärinen,et al.  Topographic Independent Component Analysis , 2001, Neural Computation.

[4]  T J Sejnowski,et al.  Learning the higher-order structure of a natural sound. , 1996, Network.

[5]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[6]  Shun-ichi Amari,et al.  Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[7]  Abeer Alwan,et al.  An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[9]  Eero P. Simoncelli,et al.  Natural Sound Statistics and Divisive Normalization in the Auditory System , 2000, NIPS.

[10]  Hynek Hermansky,et al.  A study of two dimensional linear discriminants for ASR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Rodger E. Ziemer,et al.  Principles of communications : systems, modulation, and noise , 1985 .

[12]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[13]  Ho-Young Jung,et al.  On the Efficient Speech Feature Extraction Based on Independent Component Analysis , 2002, Neural Processing Letters.

[14]  Rodger E. Ziemer,et al.  Principles of communications , 1976 .

[15]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[17]  Steve Young,et al.  The HTK book , 1995 .

[18]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Takashi Fukuda,et al.  Peripheral features for HMM-based speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  E. Oja,et al.  Independent Component Analysis , 2013 .

[21]  Te-Won Lee,et al.  A Probabilistic Approach to Single Channel Blind Signal Separation , 2002, NIPS.

[22]  Renato De Mori,et al.  Integration of fixed and multiple resolution analysis in a speech recognition system , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[23]  Panu Somervuo,et al.  Experiments with linear and nonlinear feature transformations in HMM based phone recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[24]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[25]  Steve Young,et al.  The general use of tying in phoneme-based HMM speech recognisers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Te-Won Lee,et al.  Independent Component Analysis , 1998, Springer US.

[27]  Jaihie Kim,et al.  Iris Feature Extraction Using Independent Component Analysis , 2003, AVBPA.

[28]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[29]  Ho-Young Jung,et al.  Speech feature extraction using independent component analysis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).