Classification of Complex Information: Inference of Co-Occurring Affective States from Their Expressions in Speech

We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance.

[1]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[2]  Laurence Devillers,et al.  Five emotion classes detection in real-world call center data : the use of various types of paralinguistic features , 2007 .

[3]  T. Dalgleish,et al.  Handbook of cognition and emotion , 1999 .

[4]  Rosalind W. Picard,et al.  Classical and novel discriminant features for affect recognition from speech , 2005, INTERSPEECH.

[5]  Zhongzhe Xiao,et al.  Automatic Hierarchical Classification of Emotional Speech , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[6]  Peter Cariani,et al.  Neurobiology of Harmony Perception , 2012 .

[7]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[8]  Arun Kulkarni,et al.  Recognizing emotions from speech , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[9]  S. Baron-Cohen The evolution of a theory of mind. , 1999 .

[10]  Galileo Galilei,et al.  Dialogues Concerning Two New Sciences , 1914 .

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[14]  Rosalind W. Picard,et al.  Modeling drivers' speech under stress , 2003, Speech Commun..

[15]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[16]  S. Baron-Cohen,et al.  The Cambridge Mindreading (CAM) Face-Voice Battery: Testing Complex Emotion Recognition in Adults with and without Asperger Syndrome , 2006, Journal of autism and developmental disorders.

[17]  Kristian Kroschel,et al.  Robust Speech Recognition and Understanding , 2007 .

[18]  W. J. Langford Statistical Methods , 1959, Nature.

[19]  鈴木 聡 Media Equation 研究の背景と動向 , 2011 .

[20]  C. W. Hughes Emotion: Theory, Research and Experience , 1982 .

[21]  Jeffrey F. Cohn,et al.  Quantitative description and differentiation of fundamental frequency contours , 1994, Comput. Speech Lang..

[22]  S. S. Stevens,et al.  Critical Band Width in Loudness Summation , 1957 .

[23]  Kristina Höök,et al.  User-Centred Design and Evaluation of Affective Interfaces , 2004, From Brows to Trust.

[24]  K. Kroschel,et al.  Emotion Estimation in Speech Using a 3D Emotion Space Concept , 2007 .

[25]  Peter Robinson,et al.  Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[26]  Peter B. Musgrove,et al.  A Prototypical Approach to Machine Learning , 1985, IJCAI.

[27]  Peter Gorman Pythagoras, a life , 1978 .

[28]  Ian Witten,et al.  Data Mining , 2000 .

[29]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[30]  Andrew Harley,et al.  Cambridge Dictionaries Online , 2000 .

[31]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[32]  Jonghwa Kim,et al.  Bimodal Emotion Recognition using Speech and Physiological Changes , 2007 .

[33]  Valery A. Petrushin,et al.  EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS , 1999 .

[34]  P. Ekman,et al.  Approaches To Emotion , 1985 .

[35]  Ralescu Anca,et al.  ISSUES IN MINING IMBALANCED DATA SETS - A REVIEW PAPER , 2005 .

[36]  I. Peretz,et al.  The Cognitive Neuroscience of Music , 2003 .

[37]  A. Tversky,et al.  Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .

[38]  R. Zajonc Feeling and thinking : Preferences need no inferences , 1980 .

[39]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[40]  Vered Aharonson,et al.  THE IMPACT OF F0 EXTRACTION ERRORS ON THE CLASSIFICATION OF PROMINENCE AND EMOTION , 2007 .

[41]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[42]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[43]  Catherine Pelachaud,et al.  From brows to trust: evaluating embodied conversational agents , 2004 .

[44]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[45]  A. Damasio,et al.  Deciding Advantageously Before Knowing the Advantageous Strategy , 1997, Science.

[46]  J. Haynes Brain Reading: Decoding Mental States From Brain Activity In Humans , 2011 .

[47]  D. Purves,et al.  The Statistical Structure of Human Speech Sounds Predicts Musical Universals , 2003, The Journal of Neuroscience.

[48]  M. Tomasello,et al.  Does the chimpanzee have a theory of mind? 30 years later , 2008, Trends in Cognitive Sciences.

[49]  M. Cabanac What is emotion? , 2002, Behavioural Processes.

[50]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[51]  P. Robinson,et al.  Recognising Expression in Speech for Human Computer Interaction , 2004 .

[52]  J. Stevenson The descent of mind: Psychological perspectives on hominid evolution , 2001 .

[53]  Wade Junek,et al.  Mind Reading: The Interactive Guide to Emotions , 2007 .

[54]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[55]  Marc Slors Personal Identity, Memory and Circularity: an Alternative for Q-memory , 2001 .

[56]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[57]  Mohammed Yeasin,et al.  Robust Recognition of Emotion from Speech , 2006, IVA.

[58]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[59]  T. S. Shikler,et al.  Design Challenges in Multi Modal Inference Systems for Human Computer Interaction , 2004 .

[60]  Tal Sobol Shikler,et al.  Multi-modal analysis of human computer interaction using automatic inference of aural expressions in speech , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[61]  S. Baron-Cohen,et al.  Does the autistic child have a “theory of mind” ? , 1985, Cognition.

[62]  Neil A. Dodgson,et al.  Color Search and Replace , 2005, CAe.

[63]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[64]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[65]  Khalil Sima'an,et al.  Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship , 2006, Computational Linguistics.

[66]  Mark Grundland,et al.  Color, style and composition in image processing , 2007 .

[67]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[68]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[69]  Zhongzhe Xiao,et al.  Two-stage Classification of Emotional Speech , 2006, International Conference on Digital Telecommunications (ICDT'06).

[70]  Rosalind W. Picard Affective Computing , 1997 .

[71]  K. Scherer Studying the emotion-antecedent appraisal process: An expert system approach , 1993 .

[72]  Peter Robinson,et al.  Visualizing dynamic features of expressions in speech , 2004, INTERSPEECH.

[73]  W. James II.—WHAT IS AN EMOTION ? , 1884 .

[74]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .