论文信息 - Recognizing emotions in spoken dialogue with acoustic and lexical cues

Recognizing emotions in spoken dialogue with acoustic and lexical cues

Emotions play a vital role in human communications. Therefore, it is desirable for virtual agent dialogue systems to recognize and react to user's emotions. However, current automatic emotion recognizers have limited performance compared to humans. Our work attempts to improve performance of recognizing emotions in spoken dialogue by identifying dialogue cues predictive of emotions, and by building multimodal recognition models with a knowledge-inspired hierarchy. We conduct experiments on both spontaneous and acted dialogue data to study the efficacy of the proposed approaches. Our results show that including prior knowledge on emotions in dialogue in either the feature representation or the model structure is beneficial for automatic emotion recognition.

[1] P. Roach,et al. Transcription of Prosodic and Paralinguistic Features of Emotional Speech , 1998, Journal of the International Phonetic Association.

[2] Andrew Ortony,et al. The Cognitive Structure of Emotions , 1988 .

[3] T.R. Martinez,et al. Using permutations instead of student's t distribution for p-values in paired-difference algorithm comparisons , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[4] Rafael A. Calvo,et al. Classification of affects using head movement, skin color features and physiological signals , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[5] Emmanuel Dellandréa,et al. The MediaEval 2016 Emotional Impact of Movies Task , 2016, MediaEval.

[6] Erik Cambria,et al. Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[7] Hairong Lv,et al. Emotion recognition based on pressure sensor keyboards , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[8] A. Gabrielsson. Emotion perceived and emotion felt: Same or different? , 2001 .

[9] Björn W. Schuller,et al. Dimensionality reduction for speech emotion features by multiscale kernels , 2015, INTERSPEECH.

[10] Chung-Hsien Wu,et al. Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[11] Emmanuel Dellandréa,et al. Affective Video Content Analysis: A Multidisciplinary Insight , 2018, IEEE Transactions on Affective Computing.

[12] Mingxing Xu,et al. THU-HCSI at MediaEval 2016: Emotional Impact of Movies Task , 2016, MediaEval.

[13] T. Chartrand,et al. The chameleon effect: the perception-behavior link and social interaction. , 1999, Journal of personality and social psychology.

[14] Yan Liu,et al. Mining Emotional Features of Movies , 2016, MediaEval.

[15] Peng Song,et al. Speech emotion recognition using transfer non-negative matrix factorization , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Stacy Marsella,et al. EMA: A process model of appraisal dynamics , 2009, Cognitive Systems Research.

[17] K. Scherer,et al. Conscious emotional experience emerges as a function of multilevel, appraisal-driven response synchronization , 2008, Consciousness and Cognition.

[18] Björn W. Schuller,et al. The INTERSPEECH 2011 Speaker State Challenge , 2011, INTERSPEECH.

[19] Reza Lotfian,et al. Emotion recognition using synthetic speech as neutral reference , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Catherine Pelachaud,et al. The HUMAINE Database , 2011 .

[21] Björn W. Schuller,et al. CCA based feature selection with application to continuous depression recognition from acoustic speech features , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] Hsin-Min Wang,et al. A histogram density modeling approach to music emotion recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23] Engin Erzin,et al. Affect-expressive hand gestures synthesis and animation , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[24] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[25] Björn W. Schuller,et al. Building Autonomous Sensitive Artificial Listeners , 2012, IEEE Transactions on Affective Computing.

[26] Mari Ostendorf,et al. Disfluency Detection Using a Bidirectional LSTM , 2016, INTERSPEECH.

[27] Rita Cucchiara,et al. Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild , 2017, ICMI.

[28] Chengxin Li,et al. Speech emotion recognition with acoustic and lexical features , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29] Erik Cambria,et al. A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[30] Zhen Gao,et al. Emotion recognition from peripheral physiological signals enhanced by EEG , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31] Miriam Kienast,et al. Acoustical analysis of spectral and temporal changes in emotional speech , 2000 .

[32] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[33] Shikha Jain,et al. Programming an expressive autonomous agent , 2016, Expert Syst. Appl..

[34] Patrick Cardinal,et al. ETS System for AV+EC 2015 Challenge , 2015, AVEC@ACM Multimedia.

[35] Jürgen Trouvain,et al. Comparing non-verbal vocalisations in conversational speech corpora , 2012 .

[36] Frédéric Jurie,et al. Temporal multimodal fusion for video emotion classification in the wild , 2017, ICMI.

[37] Radoslaw Niewiadomski,et al. Laugh-aware virtual agent and its impact on user amusement , 2013, AAMAS.

[38] Qin Jin,et al. Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[39] Khiet P. Truong,et al. Detection of nonverbal vocalizations using Gaussian mixture models: looking for fillers and laughter in conversational speech , 2013, INTERSPEECH.

[40] Nicu Sebe,et al. Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[41] Norbert Braunschweiler,et al. Automatic detection of inhalation breath pauses for improved pause modelling in HMM-TTS , 2013, SSW.

[42] Laurens van der Maaten. Audio-visual emotion challenge 2012: a simple approach , 2012, ICMI '12.

[43] Jeesun Kim,et al. Exploring acoustic differences between Cantonese (tonal) and English (non-tonal) spoken expressions of emotions , 2015, INTERSPEECH.

[44] P. Niedenthal,et al. Social functionality of human emotion. , 2012, Annual review of psychology.

[45] Susan T. Dumais,et al. The vocabulary problem in human-system communication , 1987, CACM.

[46] P. Petta,et al. Computational models of emotion , 2010 .

[47] Alexandru Popescu,et al. GAMYGDALA: An Emotion Engine for Games , 2014, IEEE Transactions on Affective Computing.

[48] Valery A. Petrushin,et al. Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[49] N. Ambady,et al. Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[50] K. Scherer,et al. Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception. , 2012, Emotion.

[51] Qin Jin,et al. RUC at MediaEval 2016 Emotional Impact of Movies Task: Fusion of Multimodal Features , 2016, MediaEval.

[52] Fabien Ringeval,et al. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[53] Jürgen Trouvain,et al. Laughing, breathing, clicking the prosody of nonverbal vocalisations , 2014 .

[54] James R. Glass,et al. Look, listen, and decode: Multimodal speech recognition with images , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[55] P. Lachenbruch. Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[56] Peter Y. K. Cheung,et al. Affective Level Video Segmentation by Utilizing the Pleasure-Arousal-Dominance Information , 2008, IEEE Transactions on Multimedia.

[57] E. Tan,et al. Emotion and the Structure of Narrative Film: Film As An Emotion Machine , 1995 .

[58] Kun Zhou,et al. 3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[59] A. Ortony,et al. What's basic about basic emotions? , 1990, Psychological review.

[60] E. Tan. Film-induced affect as a witness emotion , 1995 .

[61] Catholijn M. Jonker,et al. Cross-corpus analysis for acoustic recognition of negative interactions , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[62] Arthur C. Graesser,et al. The Foundations and Architecture of Autotutor , 1998, Intelligent Tutoring Systems.

[63] 付伶俐. 打磨Using Language,倡导新理念 , 2014 .

[64] Ran Zhao,et al. Towards a Dyadic Computational Model of Rapport Management for Human-Virtual Agent Interaction , 2014, IVA.

[65] Carlo Strapparava,et al. WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[66] Dongmei Jiang,et al. Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[67] Andreas Stolcke,et al. Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[68] Björn W. Schuller,et al. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[69] Björn W. Schuller,et al. Analyzing the memory of BLSTM Neural Networks for enhanced emotion classification in dyadic spoken interactions , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70] Gregory V. Bard,et al. Spelling-Error Tolerant, Order-Independent Pass-Phrases via the Damerau-Levenshtein String-Edit Distance Metric , 2007, ACSW.

[71] Björn Schuller,et al. Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[72] Ian R. Finlayson,et al. Testing the roles of disfluency and rate of speech in the coordination of conversation , 2014 .

[73] Lei Gao,et al. Information fusion based on kernel entropy component analysis in discriminative canonical correlation space with application to audio emotion recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[74] Brian Roark,et al. Learning N-Gram Language Models from Uncertain Data , 2016, INTERSPEECH.

[75] Pascale Fung,et al. Towards Empathetic Human-Robot Interactions , 2016, CICLing.

[76] Martijn Goudbeek,et al. Perceived Gesture Dynamics in Nonverbal Expression of Emotion , 2013, Perception.

[77] Marc Schröder,et al. The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[78] Y. X. Zou,et al. An experimental study of speech emotion recognition based on deep convolutional neural networks , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[79] Florian Eyben,et al. the Munich open Speech and Music Interpretation by Large Space Extraction toolkit , 2010 .

[80] Pascale Fung,et al. Real-Time Speech Emotion and Sentiment Recognition for Interactive Dialogue Systems , 2016, EMNLP.

[81] J. Bachorowski,et al. The acoustic features of human laughter. , 2001, The Journal of the Acoustical Society of America.

[82] Eduardo Coutinho,et al. Enhanced semi-supervised learning for multimodal emotion recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[83] D. O'Shaughnessy,et al. Automatic identification of filled pauses in spontaneous speech , 2000, 2000 Canadian Conference on Electrical and Computer Engineering. Conference Proceedings. Navigating to a New Era (Cat. No.00TH8492).

[84] Shrikanth S. Narayanan,et al. Robust Unsupervised Arousal Rating:A Rule-Based Framework withKnowledge-Inspired Vocal Features , 2014, IEEE Transactions on Affective Computing.

[85] Jorma Laaksonen,et al. Content-Based Prediction of Movie Style, Aesthetics, and Affect: Data Set and Baseline Experiments , 2014, IEEE Transactions on Multimedia.

[86] Andreas Wendemuth,et al. Annotators' agreement and spontaneous emotion classification performance , 2015, INTERSPEECH.

[87] Björn W. Schuller,et al. The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[88] Shrikanth S. Narayanan,et al. The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[89] K. Barczewska,et al. Detection of disfluencies in speech signal , 2013 .

[90] J. Russell,et al. Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[91] Sidney K. D'Mello,et al. Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies , 2012, ICMI '12.

[92] Nadia Magnenat-Thalmann,et al. Assistive social robots for people with special needs , 2014, 2014 International Conference on Contemporary Computing and Informatics (IC3I).

[93] Carl Plantinga,et al. Art Moods and Human Moods in Narrative Cinema , 2012 .

[94] Alessandro Vinciarelli,et al. Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[95] Chung-Hsien Wu,et al. Affective structure modeling of speech using probabilistic context free grammar for emotion recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[96] Björn W. Schuller,et al. AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[97] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.

[98] David A. van Leeuwen,et al. Automatic discrimination between laughter and speech , 2007, Speech Commun..

[99] Alan Hanjalic,et al. Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[100] Elizabeth Shriberg,et al. Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.

[101] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.

[102] Luciana Benotti,et al. Clarification Potential of Instructions , 2009, SIGDIAL Conference.

[103] Fabio A. González,et al. Multimodal latent topic analysis for image collection summarization , 2016, Inf. Sci..

[104] Johanna D. Moore,et al. Word-Level Emotion Recognition Using High-Level Features , 2014, CICLing.

[105] Maja Pantic,et al. The SEMAINE corpus of emotionally coloured character interactions , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[106] Maja Pantic,et al. Audiovisual discrimination between laughter and speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[107] Ana Paiva,et al. Creating adaptive affective autonomous NPCs , 2012, Autonomous Agents and Multi-Agent Systems.

[108] Carlo Giovannella,et al. Transmission of vocal emotion: Do we have to care about the listener? The case of the Italian speech corpus EMOVO , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[109] Dima Ruinskiy,et al. An Effective Algorithm for Automatic Detection and Exact Demarcation of Breath Sounds in Speech and Song Signals , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[110] Dirk Wildgruber,et al. Differentiation of emotions in laughter at the behavioral level. , 2009, Emotion.

[111] Takeo Kanade,et al. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[112] Mohammad H. Mahoor,et al. DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[113] Quan Huynh-Thu,et al. Physiological-Based Affect Event Detector for Entertainment Video Applications , 2012, IEEE Transactions on Affective Computing.

[114] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[115] Jonathan Gratch,et al. Multimodal approach for automatic recognition of machiavellianism , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[116] Fabio Valente,et al. The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[117] Wendi B. Heinzelman,et al. Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification , 2016, International Journal of Speech Technology.

[118] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[119] Jesse Hoey,et al. From individual to group-level emotion recognition: EmotiW 5.0 , 2017, ICMI.

[120] Laurence Devillers,et al. Detection of real-life emotions in call centers , 2005, INTERSPEECH.

[121] Johanna D. Moore,et al. Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[122] Arman Savran,et al. Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering , 2012, ICMI '12.

[123] L. de Silva,et al. Facial emotion recognition using multi-modal information , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[124] Karl Halvor Teigen,et al. Is a sigh "just a sigh"? Sighs as emotional signals and responses to a difficult task. , 2008, Scandinavian journal of psychology.

[125] J. E. Tree. The Effects of False Starts and Repetitions on the Processing of Subsequent Words in Spontaneous Speech , 1995 .

[126] Andries Petrus Engelbrecht,et al. Continuous emotion recognition using a particle swarm optimized NARX neural network , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[127] V. Adelswärd. Laughter and Dialogue: The Social Significance of Laughter in Institutional Discourse , 1989, Nordic Journal of Linguistics.

[128] Y. Song,et al. Perceived and Induced Emotion Responses to Popular Music: Categorical and Dimensional Models , 2016 .

[129] Koen V. Hindriks,et al. Effects of bodily mood expression of a robotic teacher on students , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[130] R. Lickley. Fluency and Disfluency , 2015 .

[131] Fan Zhang,et al. BUL in MediaEval 2016 Emotional Impact of Movies Task , 2016, MediaEval.

[132] Myung Jong Kim,et al. Speech emotion classification using tree-structured sparse logistic regression , 2015, INTERSPEECH.

[133] Ya Li,et al. Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition , 2015, AVEC@ACM Multimedia.

[134] Marc Schröder,et al. Experimental study of affect bursts , 2003, Speech Commun..

[135] Mohammad Soleymani,et al. A Multimodal Database for Affect Recognition and Implicit Tagging , 2012, IEEE Transactions on Affective Computing.

[136] Björn W. Schuller,et al. Detection of negative emotions in speech signals using bags-of-audio-words , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[137] Dongmei Jiang,et al. Multimodal depression recognition with dynamic visual and audio cues , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[138] Leontios J. Hadjileontiadis,et al. AUTH-SGP in MediaEval 2016 Emotional Impact of Movies Task , 2016, MediaEval.

[139] Kathrin Knautz,et al. Collective indexing of emotions in videos , 2011, J. Documentation.

[140] Chung-Hsien Wu,et al. Hierarchical modeling of temporal course in emotional expression for speech emotion recognition , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[141] Ashish Verma,et al. Formant-based technique for automatic filled-pause detection in spontaneous spoken english , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[142] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[143] S. K. Scott,et al. Individual Differences in Laughter Perception Reveal Roles for Mentalizing and Sensorimotor Systems in the Evaluation of Emotional Authenticity , 2013, Cerebral cortex.

[144] D. Rubin,et al. Comparing Correlated but Nonoverlapping Correlations , 1996 .

[145] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[146] Björn W. Schuller,et al. Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition , 2014, IEEE Signal Processing Letters.

[147] Alexandros Potamianos,et al. Valence, arousal and dominance estimation for English, German, Greek, Portuguese and Spanish lexica using semantic models , 2015, INTERSPEECH.

[148] Peter Bull,et al. Detecting Deception from Emotional and Unemotional Cues , 2009 .

[149] Thierry Pun,et al. DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[150] Thierry Pun,et al. Multimodal Emotion Recognition in Response to Videos , 2012, IEEE Transactions on Affective Computing.

[151] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[152] Catherine Lai,et al. RECOGNIZING EMOTIONS IN DIALOGUES WITH DISFLUENCIES AND NON-VERBAL VOCALISATIONS , 2015 .

[153] Guillaume Chanel,et al. Dynamic Time Warping of Multimodal Signals for Detecting Highlights in Movies , 2015, INTERPERSONAL@ICMI.

[154] Guillaume Chanel,et al. Emotion Assessment: Arousal Evaluation Using EEG's and Peripheral Physiological Signals , 2006, MRCS.

[155] H. Rothgänger,et al. Analysis of Laughter and Speech Sounds in Italian and German Students , 1998, The Science of Nature.

[156] Emmanuel Dellandréa,et al. A Large Video Database for Computational Models of Induced Emotion , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[157] William Curran,et al. Laughter Research: A Review of the ILHAIRE Project , 2016, Toward Robotic Socially Believable Behaving Systems.

[158] K. Scherer,et al. The Geneva affective picture database (GAPED): a new 730-picture database focusing on valence and normative significance , 2011, Behavior research methods.

[159] Mariusz Szwoch,et al. FEEDB: A multimodal database of facial expressions and emotions , 2013, 2013 6th International Conference on Human System Interactions (HSI).

[160] Fabien Ringeval,et al. AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge , 2017, AVEC@ACM Multimedia.

[161] Zhaocheng Huang,et al. Detecting the instant of emotion change from speech using a martingale framework , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[162] Emmanuel Dellandréa,et al. LIRIS-ACCEDE: A Video Database for Affective Content Analysis , 2015, IEEE Transactions on Affective Computing.

[163] K. Scherer,et al. The World of Emotions is not Two-Dimensional , 2007, Psychological science.

[164] Rui Xia,et al. Leveraging valence and activation information via multi-task learning for categorical emotion recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[165] Patrick Thiam,et al. Ensemble Methods for Continuous Affect Recognition: Multi-modality, Temporality, and Challenges , 2015, AVEC@ACM Multimedia.

[166] Margaret McRorie,et al. The Belfast Induced Natural Emotion Database , 2012, IEEE Transactions on Affective Computing.

[167] Randolph R. Cornelius. THEORETICAL APPROACHES TO EMOTION , 2000 .

[168] Lianhong Cai,et al. Head and facial gestures synthesis using PAD model for an expressive talking avatar , 2014, Multimedia Tools and Applications.

[169] A. Hanjalic,et al. Extracting moods from pictures and sounds: towards truly personalized TV , 2006, IEEE Signal Processing Magazine.

[170] Björn W. Schuller,et al. LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework , 2013, Image Vis. Comput..

[171] L. Lin,et al. A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[172] P. Ekman,et al. DIFFERENCES Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion , 2004 .

[173] E. Vesterinen,et al. Affective Computing , 2009, Encyclopedia of Biometrics.

[174] Nicholas B. Allen,et al. Detection of depression in adolescents based on statistical modeling of emotional influences in parent-adolescent conversations , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[175] K. Stevens,et al. Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[176] Erik Marchi,et al. Real-time robust recognition of speakers' emotions and characteristics on mobile platforms , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[177] Qi Wu,et al. CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[178] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[179] Dylan M. Jones,et al. Refining the measurement of mood: The UWIST Mood Adjective Checklist , 1990 .

[180] Björn W. Schuller,et al. Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification , 2012, IEEE Transactions on Affective Computing.

[181] Sophie K. Scott,et al. Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations , 2010, Proceedings of the National Academy of Sciences.

[182] Fabien Ringeval,et al. AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[183] D. Wildgruber,et al. Acoustic profiles of distinct emotional expressions in laughter. , 2009, The Journal of the Acoustical Society of America.

[184] Björn W. Schuller,et al. AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[185] Eric Atwell,et al. Using corpora in machine-learning chatbot systems , 2005 .

[186] George N. Votsis,et al. Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[187] H. Abdi. Partial Least Square Regression PLS-Regression , 2007 .

[188] Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech , 2015 .

[189] Reza Lotfian,et al. Practical considerations on the use of preference learning for ranking emotional speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[190] Björn W. Schuller,et al. The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[191] Anastasios Delopoulos,et al. The MUG facial expression database , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[192] Patrick Gebhard,et al. ALMA: a layered model of affect , 2005, AAMAS '05.

[193] Anton Batliner. User states, user strategies, and system performance: how to match the one with the other , 2003 .

[194] K. Scherer,et al. Vocal expression of affect , 2005 .

[195] Paul E. Debevec,et al. Effect of illumination on automatic expression recognition: A novel 3D relightable facial database , 2011, Face and Gesture 2011.

[196] Sonja Gievska,et al. Bimodal feature-based fusion for real-time emotion recognition in a mobile context , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[197] Jean-Pierre Martens,et al. A feature-based filled pause detection system for Dutch , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[198] Carlos Busso,et al. Supervised domain adaptation for emotion recognition from speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[199] Peter Robinson,et al. Dimensional affect recognition using Continuous Conditional Random Fields , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[200] Björn W. Schuller,et al. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[201] William M. Campbell,et al. Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction , 2016, AVEC@ACM Multimedia.

[202] Zhihong Zeng,et al. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[203] Guillaume Chanel,et al. Identifying aesthetic highlights in movies from clustering of physiological and behavioral signals , 2015, 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX).

[204] Emmanuel Dellandréa,et al. Continuous Arousal Self-assessments Validation Using Real-time Physiological Responses , 2015, ASM@ACM Multimedia.

[205] J. Panksepp. Affective Neuroscience: The Foundations of Human and Animal Emotions , 1998 .

[206] Shih-Chii Liu,et al. Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[207] Olga Sourina,et al. Real-Time EEG-Based Human Emotion Recognition and Visualization , 2010, 2010 International Conference on Cyberworlds.

[208] Yiying Tong,et al. FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[209] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[210] J. Averill. A CONSTRUCTIVIST VIEW OF EMOTION , 1980 .

[211] Tomoki Toda,et al. Emotion and Its Triggers in Human Spoken Dialogue: Recognition and Analysis , 2016 .

[212] Fabien Ringeval,et al. Reconstruction-error-based learning for continuous emotion recognition in speech , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[213] Dongmei Jiang,et al. Multimodal continuous affect recognition based on LSTM and multiple kernel learning , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[214] Louis-Philippe Morency,et al. Step-wise emotion recognition using concatenated-HMM , 2012, ICMI '12.

[215] Carlos Busso,et al. Tradeoff between quality and quantity of emotional annotations to characterize expressive behaviors , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[216] Marko Robnik-Sikonja,et al. Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[217] Osmar R. Zaïane,et al. Current State of Text Sentiment Analysis from Opinion to Emotion Mining , 2017, ACM Comput. Surv..

[218] Tamás D. Gedeon,et al. Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[219] EmoTV 1 : Annotation of Real-life Emotions for the Specification of Multimodal Affective Interfaces , 2005 .

[220] Mohammad Soleymani,et al. Automatic Violence Scenes Detection: A multi-modal approach , 2011, MediaEval.

[221] Fei Chen,et al. A Natural Visible and Infrared Facial Expression Database for Expression Recognition and Emotion Inference , 2010, IEEE Transactions on Multimedia.

[222] Eric O. Postma,et al. Vocal Emotion Recognition with Log-Gabor Filters , 2015, AVEC@ACM Multimedia.

[223] Ran Zhang,et al. Duration refinement for hybrid speech synthesis system using random forest , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[224] Maurizio Mancini,et al. Laughing with a Virtual Agent , 2015, Adaptive Agents and Multi-Agent Systems.

[225] Diane J. Litman,et al. Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor , 2011, Speech Commun..

[226] Daniel P. W. Ellis,et al. Laughter Detection in Meetings , 2004 .

[227] K. Kallinen,et al. Emotion perceived and emotion felt: Same and different , 2006 .

[228] Jean-Philippe Thiran,et al. Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data , 2015, Pattern Recognit. Lett..

[229] Dongmei Jiang,et al. Multimodal dimensional affect recognition using deep bidirectional long short-term memory recurrent neural networks , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[230] Nikki Mirghafori,et al. Automatic laughter detection using neural networks , 2007, INTERSPEECH.

[231] Johanna D. Moore,et al. Emotion recognition in spontaneous and acted dialogues , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[232] Eduardo Coutinho,et al. Exploring the Importance of Individual Differences to the Automatic Estimation of Emotions Induced by Music , 2015, AVEC@ACM Multimedia.

[233] S. Scott,et al. Perceptual Cues in Nonverbal Vocal Expressions of Emotion , 2010 .

[234] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[235] Sidney K. D'Mello,et al. A Review and Meta-Analysis of Multimodal Affect Detection Systems , 2015, ACM Comput. Surv..

[236] Oliver G. B. Garrod,et al. Realistic facial animation generation based on facial expression mapping , 2014, International Conference on Graphic and Image Processing.

[237] Jiucang Hao,et al. Emotion recognition by speech signals , 2003, INTERSPEECH.

[238] Wolfgang Grodd,et al. Different Types of Laughter Modulate Connectivity within Distinct Parts of the Laughter Perception Network , 2013, PloS one.

[239] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[240] Amy Beth Warriner,et al. Norms of valence, arousal, and dominance for 13,915 English lemmas , 2013, Behavior Research Methods.

[241] Gang Ren,et al. It's Not the Way You Look, It's How You Move: Validating a General Scheme for Robot Affective Behaviour , 2015, INTERACT.

[242] Roddy Cowie,et al. Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[243] Jinkyu Lee,et al. High-level feature representation using recurrent neural network for speech emotion recognition , 2015, INTERSPEECH.

[244] P. Vuilleumier,et al. How brains beware: neural mechanisms of emotional attention , 2005, Trends in Cognitive Sciences.

[245] R. Levenson. Autonomic Nervous System Differences among Emotions , 1992 .

[246] Thierry Pun,et al. Recognizing induced emotions of movie audiences: Are induced and perceived emotions the same? , 2017, ACII.

[247] Xiaoqing Feng,et al. Multimodal video classification with stacked contractive autoencoders , 2016, Signal Process..

[248] David Crystal,et al. Prosodic Systems and Intonation in English , 1969 .

[249] Nadia Magnenat-Thalmann,et al. Combining Memory and Emotion With Dialog on Social Companion: A Review , 2016, CASA.

[250] Takao Kobayashi. Prosody Control and Variation Enhancement Techniques for HMM-Based Expressive Speech Synthesis , 2015 .

[251] Carlos Busso,et al. The ordinal nature of emotions , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[252] Ting Dang,et al. An Investigation of Annotation Delay Compensation and Output-Associative Fusion for Multimodal Continuous Emotion Prediction , 2015, AVEC@ACM Multimedia.