Exploiting a Vowel Based Approach for Acted Emotion Recognition

This paper is devoted to the description of a new approach for emotion recognition. Our contribution is based on both the extraction and the characterization of phonemic units such as vowels and consonants, which are provided by a pseudophonetic speech segmentation phase combined with a vowel detector. Concerning the emotion recognition task, we explore acoustic and prosodic features from these pseudo-phonetic segments (vowels and consonants), and we compare this approach with traditional voiced and unvoiced segments. The classification is realized by the well-known k-nn classifier (k nearest neighbors) from two different emotional speech databases: Berlin (German) and Aholab (Basque).

[1]  Ibon Saratxaga,et al.  Designing and Recording an Emotional Speech Database for Corpus Based Synthesis in Basque , 2006, LREC.

[2]  L. Devillers,et al.  Voiced and Unvoiced Content of fear-type emotions in the SAFE Corpus , 2006 .

[3]  Régine André-Obrecht,et al.  A new statistical approach for the automatic segmentation of continuous speech signals , 1988, IEEE Trans. Acoust. Speech Signal Process..

[4]  François Pellegrino,et al.  Automatic language identification: an alternative approach to phonetic modelling , 2000, Signal Process..

[5]  C Pillot,et al.  [Vocal effectiveness in speech and singing: acoustical, physiological and perceptive aspects. applications in speech therapy]. , 2006, Revue de laryngologie - otologie - rhinologie.

[6]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[7]  Werner Verhelst,et al.  An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech , 2007, Speech Commun..

[8]  Roddy Cowie,et al.  ASR for emotional speech: Clarifying the issues and enhancing performance , 2005, Neural Networks.

[9]  Jon Sánchez,et al.  Meaningful Parameters in Emotion Characterisation , 2007, COST 2102 Workshop.

[10]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[11]  R. Plutchik The psychology and biology of emotion , 1994 .

[12]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13]  Emmanuel Dellandréa,et al.  Automatic Hierarchical Classification of Emotional Speech , 2008 .

[14]  D. V. Leeuwen,et al.  An open-set detection evaluation methodology for automatic emotion recognition in speech , 2007 .

[15]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[16]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[17]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[18]  Jean-Luc Rouas,et al.  Modeling long and short-term prosody for language identification , 2005, INTERSPEECH.

[19]  Rosalind W. Picard Affective computing: (526112012-054) , 1997 .

[20]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[21]  Alexander H. Waibel,et al.  Toward movement-invariant automatic lip-reading and speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[23]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[24]  D. Datcu,et al.  The recognition of emotions from speech using GentleBoost classifier: A comparison approach , 2006 .

[25]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[26]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[27]  Jérôme Farinas,et al.  Rhythmic unit extraction and modelling for automatic language identification , 2005, Speech Commun..

[28]  Francis Nolan,et al.  IVie - a comparative transcription system for intonational variation in English , 1998, ICSLP.

[29]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[30]  Roddy EMOTION-ORIENTED COMPUTING: STATE of the ART and KEY CHALLENGES , 2006 .

[31]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[32]  Klaus R. Scherer,et al.  Acoustic correlates of task load and stress , 2002, INTERSPEECH.

[33]  Paul Sajda,et al.  Role of feature selection in building pattern recognizers for computer-aided diagnosis , 1998, Medical Imaging.

[34]  Zhongzhe Xiao,et al.  Hierarchical Classification of Emotional Speech , 2007 .