Vowels Formants Analysis Allows Straightforward Detection of High Arousal Acted and Spontaneous Emotions

The role of automatic emotion recognition from speech grows continually because of accepted importance of reacting to the emotional state of the user in human-computer interaction. Most part of state-of-the-art emotion recognition methods are based on context independent turnand frame-level analysis. In our earlier ICME 2011 article it has been shown that robust high arousal acted emotions detection can be performed on the context dependent vowel basis. In contrast to using a HMM/GMM classification with 39-dimensional MFCC vectors, a much more convenient Neyman-Pearson criterion with the only one average F1 value is employed here. In this paper we apply the proposed method to the spontaneous emotion recognition from speech. Also, we avoid the use of speaker-dependent acoustic features in favor of gender-specific ones. Finally we compare performances of acted and spontaneous emotions for different criterion threshold values.

[1]  Caren Brinckmann,et al.  THE “ KIEL CORPUS OF READ SPEECH ” AS A RESOURCE FOR PROSODY PREDICTION IN SPEECH SYNTHESIS , 2005 .

[2]  Ingo Siegert,et al.  Vowels formants analysis allows straightforward detection of high arousal emotions , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[3]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[4]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[5]  Klaus R. Scherer,et al.  Emotion dimensions and formant position , 2009, INTERSPEECH.

[6]  Florian Schiel,et al.  The Bavarian Archive for Speech Signals , 1997 .

[7]  William J. J. Roberts,et al.  Speaker classification using composite hypothesis testing and list decoding , 2005, IEEE Transactions on Speech and Audio Processing.

[8]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[10]  Florian Schiel,et al.  The SmartKom Multimodal Corpus at BAS , 2002, LREC.

[11]  Hans G. Tillmann,et al.  The Phondat-verbmobil speech corpus , 1995, EUROSPEECH.

[12]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[13]  Björn W. Schuller,et al.  Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach , 2010, Adv. Hum. Comput. Interact..

[14]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..