Speech emotion recognition using supervised manifold learning based on all-class and pairwise-class feature extraction

Supervised manifold learning has been used widely in speech emotion recognition, but dimensionality is reduced with all classes being considered jointly, difference between feature subsets of different classes is ignored. In this paper, new idea about feature extraction based on fusion of all-class and pairwise-class is proposed, an improved manifold learning method is used to extract features and the feature subsets is divided into several parts, one is based on all-class structure, others based on each pair of classes. All-class subset is used in K-nearest (KNN) classifiers, others used in Support Vector Machine (SVM) classifiers, and an supervised multi-classifiers system of speech emotion recognition is constructed. Experiments show a significant improvement in recognition accuracy.

[1]  Nenghai Yu,et al.  Neighborhood Preserving Projections (NPP): A Novel Linear Dimension Reduction Method , 2005, ICIC.

[2]  Matti Pietikäinen,et al.  Supervised Locally Linear Embedding , 2003, ICANN.

[3]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[4]  Ye Cheng Speech Emotion Recognition Based on Covariance Descriptor and Riemannian Manifold , 2009 .

[6]  Chun Chen,et al.  Manifolds Based Emotion Recognition in Speech , 2007, ROCLING/IJCLCLP.

[7]  Chun Chen,et al.  Speech Emotion Recognition Based on a Fusion of All-Class and Pairwise-Class Feature Selection , 2007, International Conference on Computational Science.

[8]  Zhi-Hua Zhou,et al.  Supervised nonlinear dimensionality reduction for visualization and classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Guoqiang Wang,et al.  Uncorrelated Neighborhood Preserving Projections for Face Recognition , 2011, AICI.

[10]  Changbo Hu,et al.  Probabilistic expression analysis on manifolds , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[12]  Lawrence K. Saul,et al.  Exploratory analysis and visualization of speech and music by locally linear embedding , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Y. Attikiouzel,et al.  Dimension and structure of the speech space , 1992 .

[14]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[15]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.