Emotional Speech Analysis on Nonlinear Manifold

This paper presents a speech emotion recognition system on nonlinear manifold. Instead of straight-line distance, geodesic distance was adopted to preserve the intrinsic geometry of speech corpus. Based on geodesic distance estimation, we developed an enhanced Lipschitz embedding to embed the 64-dimensional acoustic features into a six-dimensional space. In this space, speech data with the same emotional state were located close to one plane, which was beneficial to emotion classification. The compressed testing data were classified into six archetypal emotional states (neutral, anger, fear, happiness, sadness and surprise) by a trained linear support vector machine (SVM) system. Experimental results demonstrate that compared with traditional methods of feature extraction on linear manifold and feature selection, the proposed system makes 9%-26% relative improvement in speaker-independent emotion recognition and 5%-20% improvement in speaker-dependent

[1]  Rosalind W. Picard Affective Computing , 1997 .

[2]  Lawrence K. Saul,et al.  Exploratory analysis and visualization of speech and music by locally linear embedding , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Changbo Hu,et al.  Probabilistic expression analysis on manifolds , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Dae-Jong Lee,et al.  Emotion recognition from the facial image and speech signal , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[5]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Wu Zhong International Trends of Pattern Recognition Research A Brief Introduction to the 18th International Conference on Pattern Recognition , 2006 .

[7]  Franck Davoine,et al.  Facial expression analysis by using KPCA , 2003, IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, 2003. Proceedings. 2003.

[8]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[9]  Shrikanth S. Narayanan,et al.  Classifying emotions in human-machine spoken dialogs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[10]  Kilian Q. Weinberger,et al.  Spectral Methods for Dimensionality Reduction , 2006, Semi-Supervised Learning.

[11]  Chung-Hsien Wu,et al.  Emotion recognition using acoustic features and textual content , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[12]  M. Turk,et al.  Probabilistic expression analysis on manifolds , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[14]  Y. Attikiouzel,et al.  Dimension and structure of the speech space , 1992 .

[15]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.