Emotion Perception and Recognition from Speech

With the increasing role of speech interfaces in human—computer interac¬tion applications, automatically recognizing emotions from human speech becomes more and more important. This chapter begins by introducing the correlations be¬tween basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions. Several recognition methods are then described to illustrate the performance of the previously proposed models, including support vector machine (SVM), K-nearest neighbors (KNN), neural networks, and the like.

[1]  Elizabeth Shriberg,et al.  Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.

[2]  T. Dalgleish,et al.  Handbook of cognition and emotion , 1999 .

[3]  Ryohei Nakatsu,et al.  Emotion recognition and its application to computer agents with spontaneous interactive capabilities , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[4]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[5]  Anton Batliner,et al.  Use of prosodic speech characteristics for automated detection of alcohol intoxication , 2001 .

[6]  Biing-Hwang Juang,et al.  An overview on automatic speech attribute transcription (ASAT) , 2007, INTERSPEECH.

[7]  A. Ortony,et al.  What's basic about basic emotions? , 1990, Psychological review.

[8]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[9]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[11]  Valery A. Petrushin,et al.  EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS , 1999 .

[12]  Ze-Jing Chuang,et al.  Multi-Modal Emotion Recognition from Speech and Text , 2004, ROCLING/IJCLCLP.

[13]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[14]  Zeynep Inanoglu,et al.  Emotive alert: HMM-based emotion detection in voicemail messages , 2005, IUI '05.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  John H. L. Hansen,et al.  Frequency distribution based weighted sub-band approach for classification of emotional/stressful content in speech , 2003, INTERSPEECH.

[17]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[18]  Mohamed S. Kamel,et al.  Segment-based approach to the recognition of emotions in speech , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[19]  Cynthia Breazeal,et al.  Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[20]  Li Deng,et al.  Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition , 2003, IEEE Trans. Speech Audio Process..

[21]  Johanna D. Moore,et al.  Proceedings of Interspeech 2008 , 2008 .

[22]  BoschLouis ten Emotions, speech and the ASR framework , 2003 .

[23]  Krzysztof Slot,et al.  Low-dimensional feature space derivation for emotion recognition , 2005, INTERSPEECH.

[24]  Louis ten Bosch,et al.  Emotions, speech and the ASR framework , 2003, Speech Commun..

[25]  A. Damasio Descartes' error: emotion, reason, and the human brain. avon books , 1994 .

[26]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[27]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[28]  J. Cohn,et al.  A combination of vocal fo dynamic and summary features discriminates between three pragmatic categories of infant-directed speech. , 1996, Child development.

[29]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[30]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[31]  J. Borod The Neuropsychology of emotion , 2000 .

[32]  Arthur C. Graesser,et al.  Toward an Affect-Sensitive AutoTutor , 2007, IEEE Intelligent Systems.

[33]  Dilek Z. Hakkani-Tür,et al.  Using context to improve emotion detection in spoken dialog systems , 2005, INTERSPEECH.

[34]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[35]  Werner Verhelst,et al.  An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech , 2007, Speech Commun..

[36]  Qiang Huo,et al.  An environment compensated minimum classification error training approach and its evaluation on Aurora2 database , 2002, INTERSPEECH.

[37]  Rosalind W. Picard,et al.  Classical and novel discriminant features for affect recognition from speech , 2005, INTERSPEECH.

[38]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[39]  Malcolm Slaney,et al.  BabyEars: A recognition system for affective vocalizations , 2003, Speech Commun..

[40]  Jennifer Healey,et al.  Toward Machine Emotional Intelligence: Analysis of Affective Physiological State , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Ling Guan,et al.  A neural network approach for human emotion recognition in speech , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[42]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[43]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[44]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[45]  Diane J. Litman,et al.  ITSPOKE: An Intelligent Tutoring Spoken Dialogue System , 2004, NAACL.