A Self-Attentive Emotion Recognition Network

Attention networks constitute the state-of-the-art paradigm for capturing long temporal dynamics. This paper examines the efficacy of this paradigm in the challenging task of emotion recognition in dyadic conversations. In this work, we introduce a novel attention mechanism capable of inferring the immensity of the effect of each past utterance on the current speaker emotional state. The proposed self-attention network captures the correlation patterns among consecutive encoder network states, thus enabling the robust and effective modeling of temporal dynamics over arbitrary long temporal horizons. We exhibit the effectiveness of our approach considering the challenging IEMOCAP benchmark. We show that, our devised methodology outperforms state-of-the-art alternatives and commonly used approaches, giving rise to promising new research directions in the context of Online Social Network (OSN) analysis tasks.

[1]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[2]  William M. Pottenger,et al.  Classification of Emotions in Internet Chat: An Application of Machine Learning Using Speech Phonemes , 2003 .

[3]  John L. Florell,et al.  Sharing , 1979 .

[4]  Wenhong Chen,et al.  Sharing, Liking, Commenting, and Distressed? The Pathway Between Facebook Interaction and Psychological Distress , 2013, Cyberpsychology Behav. Soc. Netw..

[5]  Stan Szpakowicz,et al.  Using Roget’s Thesaurus for Fine-grained Emotion Recognition , 2008, IJCNLP.

[6]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[7]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Rohit Kumar,et al.  Ensemble of SVM trees for multimodal emotion recognition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[10]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[11]  Samit Bhattacharya,et al.  Using Deep and Convolutional Neural Networks for Accurate Emotion Classification on DEAP Dataset , 2017, AAAI.

[12]  Yue Zhang,et al.  Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings , 2016, AAAI.

[13]  Aijun An,et al.  Unsupervised Emotion Detection from Text Using Semantic and Syntactic Relations , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[14]  Sunghwan Mac Kim,et al.  Evaluation of Unsupervised Emotion Models to Textual Affect Recognition , 2010, HLT-NAACL 2010.

[15]  Tao Li,et al.  A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge , 2009, ACL.

[16]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[20]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[21]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[22]  Honglak Lee,et al.  Deep learning for robust feature generation in audiovisual emotion recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[24]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[25]  Erik Cambria,et al.  Context-Dependent Sentiment Analysis in User-Generated Videos , 2017, ACL.

[26]  Erik Cambria,et al.  Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos , 2018, NAACL.

[27]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[28]  Roberto Basili,et al.  A context-based model for Sentiment Analysis in Twitter , 2014, COLING.

[29]  Hsin-Hsi Chen,et al.  Emotion Classification Using Web Blog Corpora , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  Yue Zhang,et al.  Context-Sensitive Twitter Sentiment Classification Using Neural Network , 2016, AAAI.

[32]  Lovekesh Vig,et al.  Resolving Abstract Anaphora Implicitly in Conversational Assistants using a Hierarchically stacked RNN , 2018, KDD.

[33]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  Richard T. Cauldwell WHERE DID THE ANGER GO? THE ROLE OF CONTEXT IN INTERPRETING EMOTION IN SPEECH , 2000 .