Auditory-model based robust feature selection for speech recognition.

It is shown that robust dimension-reduction of a feature set for speech recognition can be based on a model of the human auditory system. Whereas conventional methods optimize classification performance, the proposed method exploits knowledge implicit in the auditory periphery, inheriting its robustness. Features are selected to maximize the similarity of the Euclidean geometry of the feature domain and the perceptual domain. Recognition experiments using mel-frequency cepstral coefficients (MFCCs) confirm the effectiveness of the approach, which does not require labeled training data. For noisy data the method outperforms commonly used discriminant-analysis based dimension-reduction methods that rely on labeling. The results indicate that selecting MFCCs in their natural order results in subsets with good performance.

[1]  Huaiqing Wang,et al.  A discretization algorithm based on a heterogeneity criterion , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Richard Heusdens,et al.  A Gammatone-based Psychoacoustical Modeling Approach for Speech and Audio Coding , 2001 .

[3]  Bhaskar D. Rao,et al.  Theoretical analysis of the high-rate vector quantization of LPC parameters , 1995, IEEE Trans. Speech Audio Process..

[4]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[5]  Biing-Hwang Juang,et al.  A study of auditory modeling and processing for speech signals , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Robert M. Gray,et al.  Asymptotic Performance of Vector Quantizers with a Perceptual Distortion Measure , 1997, IEEE Trans. Inf. Theory.

[7]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[8]  Tamás Linder,et al.  High-Resolution Source Coding for Non-Difference Distortion Measures: Multidimensional Companding , 1999, IEEE Trans. Inf. Theory.

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[11]  Roberto Togneri,et al.  A Temporal Auditory Model with Adaptation for Automatic Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  F. Valente,et al.  Maximum entropy discrimination (MED) feature subset selection for speech recognition , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[13]  W. Bastiaan Kleijn,et al.  Feature Selection Under a Complexity Constraint , 2009, IEEE Transactions on Multimedia.

[14]  Amruth N. Kumar,et al.  Links , 1999, INTL.

[15]  W. Bastiaan Kleijn,et al.  The Sensitivity Matrix: Using Advanced Auditory Models in Speech and Audio Processing , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.