A novel spectro-temporal feature extraction method for phoneme classification

In this paper, we propose a new type of feature extraction method inspired by the model of auditory cortical processing. The output of the cortical model is a 4-D spectro-temporal representation of the sound that each point of this space indicates the amount of energy at the corresponding time, frequency, rate and scale. In the proposed model, one proper rate and one proper scale are selected among the rates and scales. Therefore, the output of the cortical model decreases the dimensions from a 4-D space to a 2-D space. In most ASR systems, HMM classifier model is used to solve the variable length problem after a framing procedure which affects the feature extraction stage and it causes to spoil the temporal information of the phoneme signal in the features level. In the proposed model, this problem is handled in the feature extraction stage. In this paper, some fixed length features are achieved by the analysis of spectro-temporal space for each phoneme. Since the provided feature has a fixed-dimension, we use a classical classifier as support vector machine for a phoneme classification task. In order to evaluate the performance of the proposed model, we performed a phoneme classification task on seven subset of the TMIT corpus. The phoneme classification results achieved on consonants and vowels showed the average performance improvement of 5.15% and 9.65% relative to the HMM-MFCC +AMFCC approach. In addition, the average improvements are 8.7% and 2.68% relative to the SVM-MFCC approach, respectively.

[1]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[2]  Diego H. Milone,et al.  Auditory Cortical Representations of Speech Signals for Phoneme Classification , 2007, MICAI.

[3]  Michael Kleinschmidt,et al.  Localized spectro-temporal features for automatic speech recognition , 2003, INTERSPEECH.

[4]  Kuansan Wang,et al.  Spectral shape analysis in the central auditory system , 1995, IEEE Trans. Speech Audio Process..

[5]  Christoph E Schreiner,et al.  Spectrotemporal structure of receptive fields in areas AI and AAF of mouse auditory cortex. , 2003, Journal of neurophysiology.

[6]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[7]  Nima Mesgarani,et al.  Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Biing-Hwang Juang,et al.  Speech Analysis in a Model of the Central Auditory System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Fernando Pérez-Cruz,et al.  SVM classifiers for ASR: A discussion about parameterization , 2004, 2004 12th European Signal Processing Conference.