Speech Emotion Classification on a Riemannian Manifold

We present a novel algorithm for speech emotion classification. In contrast to previous methods, we additionally consider the relations between simple features by incorporating covariance matrices as the new feature descriptors. Since non-singular covariance matrices do not lie on a linear space, we endow the space with an affine invariance metric and render it into a Riemannian manifold. After that we use the tangent space to approximate the manifold. Classification is performed in the tangent space and a generalized principal component analysis is presented. We test the algorithm on speech emotion classification and the experiment results show an improvement at around 13%(+3% with PCA) in recognition accuracy. Based on that we are able to train one simple model to accurately differentiate the emotions from both genders.

[1]  Fatih Murat Porikli,et al.  Covariance Tracking using Model Update Based on Lie Algebra , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[3]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[4]  L.C. De Silva,et al.  Detection of stress and emotion in speech using traditional and FFT based log energy features , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[5]  Tsang-Long Pao,et al.  Emotion recognition from Mandarin speech signals , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[6]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[7]  Chung-Hsien Wu,et al.  Emotion recognition using acoustic features and textual content , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[8]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[9]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[10]  Ralf Kompe,et al.  Emotional space improves emotion recognition , 2002, INTERSPEECH.

[11]  Kwee-Bo Sim,et al.  Emotion Recognition Based on Frequency Analysis of Speech Signal , 2002, Int. J. Fuzzy Log. Intell. Syst..

[12]  Fatih Murat Porikli,et al.  Human Detection via Classification on Riemannian Manifolds , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[14]  Valérie Maffiolo,et al.  Analysis of emotional speech in voice mail messages: the influence of speakers' gender , 2004, INTERSPEECH.

[15]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[17]  Tapio Seppänen,et al.  Prosody-based classification of emotions in spoken finnish , 2003, INTERSPEECH.

[18]  P. Thomas Fletcher,et al.  Riemannian geometry for the statistical analysis of diffusion tensor data , 2007, Signal Process..

[19]  Manfredo P. do Carmo,et al.  Differential geometry of curves and surfaces , 1976 .

[20]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[21]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).