Dimensionality Reduction for Emotional Speech Recognition

The number of speech features that are introduced to emotional speech recognition exceeds some thousands and this makes dimensionality reduction an inevitable part of an emotional speech recognition system. The elastic net, the greedy feature selection, and the supervised principal component analysis are three recently developed dimensionality reduction algorithms that we have considered their application to tackle this issue. Together with PCA, these four methods include both supervised and unsupervised, as well as filter and projection-type dimensionality reduction methods. For experimental reasons, we have chosen VAM corpus. We have extracted two sets of features and have investigated the efficiency of the application of the four dimensionality reduction methods to the combination of the two sets, besides each of the two. The experimental results of this study show that in spite of a dimensionality reduction stage, a longer vector of speech features does not necessarily result in a more accurate prediction of emotion.

[1]  Ya Li,et al.  The CASIA Audio Emotion Recognition Method for Audio/Visual Emotion Challenge 2011 , 2011, ACII.

[2]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[3]  Mohamed S. Kamel,et al.  An Efficient Greedy Method for Unsupervised Feature Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[4]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[5]  Kristian Kroschel,et al.  Audio-visual emotion recognition using an emotion space concept , 2008, 2008 16th European Signal Processing Conference.

[6]  Elliot Moore,et al.  Investigating Glottal Parameters and Teager Energy Operators in Emotion Recognition , 2011, ACII.

[7]  Nadia Bianchi-Berthouze,et al.  Naturalistic Affective Expression Classification by a Multi-stage Approach Based on Hidden Markov Models , 2011, ACII.

[8]  Zohreh Azimifar,et al.  Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds , 2011, Pattern Recognit..

[9]  Shrikanth Narayanan,et al.  Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Fakhri Karray,et al.  Audio-Based Emotion Recognition from Natural Conversations Based on Co-Occurrence Matrix and Frequency Domain Energy Distribution Features , 2011, ACII.

[11]  Björn W. Schuller,et al.  Recognizing Affect from Linguistic Information in 3D Continuous Space , 2011, IEEE Transactions on Affective Computing.

[12]  Mark A. Clements,et al.  Investigating the Use of Formant Based Features for Detection of Affective Dimensions in Speech , 2011, ACII.