Recognizing Affect from Linguistic Information in 3D Continuous Space

Most research efforts dealing with recognition of emotion-related states from the human speech signal concentrate on acoustic analysis. However, the last decade's research results show that the task cannot be solved to complete satisfaction, especially when it comes to real life speech data and in particular to the assessment of speakers' valence. This paper therefore investigates novel approaches to the additional exploitation of linguistic information. To ensure good applicability to the real world, spontaneous speech and nonacted nonprototypical emotions are examined in the recently popular dimensional model in 3D continuous space. As there is a lack of linguistic analysis approaches and experiments for this model, various methods are proposed. Best results are obtained with the described bag of n-gram and character n-gram approaches introduced for the first time for this task and allowing for advanced vector space representation of the spoken contents. Furthermore, string kernels are considered. By early fusion and combined space optimization of the proposed linguistic features with acoustic ones, the regression of continuous emotion primitives outperforms reported benchmark results on the VAM corpus of highly emotional face-to-face communication.

[1]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[2]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  C. Elliott The affective reasoner: a process model of emotions in a multi-agent system , 1992 .

[4]  J. Breese,et al.  Modeling Emotional State and Personality for Conversational Agents , 1998 .

[5]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[6]  Roddy Cowie,et al.  What a neural net needs to know about emotion words , 1999 .

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Elmar Nöth,et al.  PROSODIC FEATURE EVALUATION: BRUTE FORCE OR WELL DESIGNED? , 1999 .

[9]  Sharon L. Oviatt,et al.  Multimodal Integration - A Statistical View , 1999, IEEE Trans. Multim..

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  Alex Waibel,et al.  EMOTION-SENSITIVE HUMAN-COMPUTER INTERFACES , 2000 .

[12]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[13]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[14]  Gerhard Rigoll The ALERT system: advanced broadcast speech recognition technology for selective dissemination of multimedia information , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[15]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[16]  Shrikanth S. Narayanan,et al.  Politeness and frustration language in child-machine interactions , 2001, INTERSPEECH.

[17]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[18]  Anthony C. Boucouvalas,et al.  Text-to-Emotion Engine for Real Time Internet Communication , 2002 .

[19]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[20]  Shrikanth S. Narayanan,et al.  Combining acoustic and language information for emotion recognition , 2002, INTERSPEECH.

[21]  L. Lamel,et al.  Emotion detection in task-oriented spoken dialogues , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[22]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[23]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[24]  Diane J. Litman,et al.  Recognizing emotions from student speech in tutoring dialogues , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[25]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[26]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[27]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[28]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[29]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[30]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[31]  Björn W. Schuller,et al.  Applying Bayesian belief networks in approximate string matching for robust keyword-based retrieval , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[32]  Chung-Hsien Wu,et al.  Emotion recognition using acoustic features and textual content , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[33]  Elmar Nöth,et al.  Looking at the Last Two Turns, I'd Say This Dialogue Is Doomed - Measuring Dialogue Success , 2004, TSD.

[34]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  Janyce Wiebe,et al.  Just How Mad Are You? Finding Strong and Weak Opinion Clauses , 2004, AAAI.

[36]  David A. van Leeuwen,et al.  Automatic detection of laughter , 2005, INTERSPEECH.

[37]  William M. Pottenger,et al.  Posting Act Tagging Using Transformation-Based Learning , 2005, Foundations of Data Mining and knowledge Discovery.

[38]  Soo-Min Kim,et al.  Automatic Detection of Opinion Bearing Words and Sentences , 2005, IJCNLP.

[39]  Björn Schuller,et al.  Speech Emotion Recognition Exploiting Acoustic and Linguistic Information Sources , 2005 .

[40]  Elizabeth Shriberg,et al.  Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.

[41]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[42]  Nick Campbell,et al.  No laughing matter , 2005, INTERSPEECH.

[43]  Björn W. Schuller,et al.  Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.

[44]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[45]  Roddy Cowie,et al.  ASR for emotional speech: Clarifying the issues and enhancing performance , 2005, Neural Networks.

[46]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[47]  Chun Chen,et al.  Emotion Recognition from Noisy Speech , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[48]  Ananth N. Iyer,et al.  Emotion Detection From Infant Facial Expressions And Cries , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[49]  Kornel Laskowski,et al.  Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .

[50]  Constantine Kotropoulos,et al.  Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections , 2006, 2006 14th European Signal Processing Conference.

[51]  Xuanjing Huang,et al.  Mining the Relation between Sentiment Expression and Target Using Dependency of Words , 2006, PACLIC.

[52]  David H. Evans,et al.  Detection of cough signals in continuous audio recordings using hidden Markov models , 2006, IEEE Transactions on Biomedical Engineering.

[53]  Joel R. Tetreault,et al.  Using system and user performance features to improve emotion detection in spoken tutoring dialogs , 2006, INTERSPEECH.

[54]  Uri Iurgel,et al.  Automatic media monitoring using stochastic pattern recognition techniques , 2006 .

[55]  Björn W. Schuller,et al.  Recognition of interest in human conversational speech , 2006, INTERSPEECH.

[56]  Kathleen Pichora-Fuller,et al.  Use of lexical and affective prosodic cues to emotion by younger and older adults , 2007, INTERSPEECH.

[57]  Mitsuru Ishizuka,et al.  Assessing Sentiment of Text by Semantic Dependency and Contextual Valence Analysis , 2007, ACII.

[58]  G. Rigoll,et al.  Acoustic Emotion Recognition in Car Environment Using a 3D Emotion Space Approach , 2007 .

[59]  Björn W. Schuller,et al.  Towards More Reality in the Recognition of Emotional Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[60]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[61]  Shrikanth S. Narayanan,et al.  Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[62]  Björn W. Schuller,et al.  Comparing one and two-stage acoustic modeling in the recognition of emotion in speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[63]  Alexander K. Seewald,et al.  Lambda pruning: an approximation of the string subsequence kernel for practical SVM classification and redundancy clustering , 2007, Adv. Data Anal. Classif..

[64]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[65]  Nicole Novielli,et al.  'You are Sooo Cool, Valentina!' Recognizing Social Attitude in Speech-Based Dialogues with an ECA , 2007, ACII.

[66]  Björn W. Schuller,et al.  On the Necessity and Feasibility of Detecting a Driver's Emotional State While Driving , 2007, ACII.

[67]  T. Danisman,et al.  Feeler: Emotion Classification of Text Using Vector Space Model , 2008 .

[68]  Björn W. Schuller,et al.  Detecting problems in spoken child-computer interaction , 2008, WOCCI.

[69]  Björn W. Schuller,et al.  Static and Dynamic Modelling for the Recognition of Non-verbal Vocalisations in Conversational Speech , 2008, PIT.

[70]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[71]  Dirk Heylen,et al.  Towards responsive Sensitive Artificial Listeners , 2008 .

[72]  Björn W. Schuller,et al.  Mothers, adults, children, pets — towards the acoustics of intimacy , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[73]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[74]  Björn W. Schuller,et al.  Emotion recognition from speech: Putting ASR in the loop , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[75]  Mohand Boughanem,et al.  Using WordNet's Semantic Relations for Opinion Detection in Blogs , 2009, ECIR.

[76]  Björn W. Schuller,et al.  Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[77]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[78]  Björn Schuller,et al.  Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..

[79]  W. Minker,et al.  Handling Emotions in Human-Computer Dialogues , 2009 .

[80]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[81]  Björn W. Schuller,et al.  “The Godfather” vs. “Chaos”: Comparing Linguistic Analysis Based on On-line Knowledge Sources and Bags-of-N-Grams for Movie Review Valence Estimation , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[82]  Tim Polzehl,et al.  Emotion classification in children's speech using fusion of acoustic and linguistic features , 2009, INTERSPEECH.

[83]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[84]  Rui Xia,et al.  Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence , 2010, INTERSPEECH.

[85]  Björn W. Schuller,et al.  Emotion recognition using imperfect speech recognition , 2010, INTERSPEECH.

[86]  Björn Schuller,et al.  ‘Mister D.J., Cheer Me Up!’: Musical and Textual Features for Automatic Mood Classification , 2010 .

[87]  Björn W. Schuller,et al.  On the Impact of Children's Emotional Speech on Acoustic and Language Models , 2010, EURASIP J. Audio Speech Music. Process..

[88]  Björn W. Schuller,et al.  Emotion representation, analysis and synthesis in continuous space: A survey , 2011, Face and Gesture 2011.

[89]  Loïc Kessous,et al.  Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech , 2011, Comput. Speech Lang..

[90]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[91]  K. Kroschel,et al.  RULE-BASED EMOTION CLASSIFICATION USING ACOUSTIC FEATURES , .