论文信息 - Emotion recognition using linear transformations in combination with video

Emotion recognition using linear transformations in combination with video

The paper discuses the usage of linear transformations of Hidden Markov Models, normally employed for speaker and environment adaptation, as a way of extracting the emotional components from the speech. A constrained version of Maximum Likelihood Linear Regression (CMLLR) transformation is used as a feature for classification of normal or aroused emotional state. We present a procedure of incrementally building a set of speaker independent acoustic models, that are used to estimate the CMLLR transformations for emotion classification. An audio-video database of spontaneous emotions (AvID) is briefly presented since it forms the basis for the evaluation of the proposed method. Emotion classification using the video part of the database is also described and the added value of combining the visual information with the audio features is shown.

[1] Paul A. Viola,et al. Robust Real-time Object Detection , 2001 .

[2] Simon Dobrisek,et al. Spoken Language Resources at LUKS of the University of Ljubljana , 2003, Int. J. Speech Technol..

[3] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[4] Andreas Stolcke,et al. Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing , 2004 .

[5] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[6] Astrid Paeschke,et al. A database of German emotional speech , 2005, INTERSPEECH.

[7] Rok Gajsek,et al. Multi-Modal Emotional Database: AvID , 2009, Informatica.