MixedEmotions: An Open-Source Toolbox for Multimodal Emotion Analysis

Recently, there is an increasing tendency to embed functionalities for recognizing emotions from user-generated media content in automated systems such as call-centre operations, recommendations, and assistive technologies, providing richer and more informative user and content profiles. However, to date, adding these functionalities was a tedious, costly, and time-consuming effort, requiring identification and integration of diverse tools with diverse interfaces as required by the use case at hand. The MixedEmotions Toolbox leverages the need for such functionalities by providing tools for text, audio, video, and linked data processing within an easily integrable plug-and-play platform. These functionalities include: 1) for text processing: emotion and sentiment recognition; 2) for audio processing: emotion, age, and gender recognition; 3) for video processing: face detection and tracking, emotion recognition, facial landmark localization, head pose estimation, face alignment, and body pose estimation; and 4) for linked data: knowledge graph integration. Moreover, the MixedEmotions Toolbox is open-source and free. In this paper, we present this toolbox in the context of the existing landscape, and provide a range of detailed benchmarks on standard test-beds showing its state-of-the-art performance. Furthermore, three real-world use cases show its effectiveness, namely, emotion-driven smart TV, call center monitoring, and brand reputation analysis.

[1]  Björn W. Schuller,et al.  openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit , 2016, J. Mach. Learn. Res..

[2]  Jaime Redondo,et al.  The Spanish adaptation of ANEW (Affective Norms for English Words) , 2007, Behavior research methods.

[3]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[4]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[5]  Björn W. Schuller,et al.  Introducing CURRENNT: the munich open-source CUDA recurrent neural network toolkit , 2015, J. Mach. Learn. Res..

[6]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[7]  Fabien Ringeval,et al.  Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks , 2016, INTERSPEECH.

[8]  Adam Herout,et al.  Usability of Pilot's Gaze in Aeronautic Cockpit for Safer Aircraft , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[9]  Rediet Abebe Can Cascades be Predicted? , 2014 .

[10]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jens Lehmann,et al.  Integrating NLP Using Linked Data , 2013, SEMWEB.

[12]  Doug Downey,et al.  A probabilistic graphical model for brand reputation assessment in social networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[13]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[14]  J. Fernando Sánchez-Rada,et al.  Senpy: A Pragmatic Linked Sentiment Analysis Framework , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[15]  Catherine Pelachaud,et al.  EmotionML - An Upcoming Standard for Representing Emotions and Related States , 2011, ACII.

[16]  Long Jiang,et al.  User-level sentiment analysis incorporating social networks , 2011, KDD.

[17]  Saleem Alhabash,et al.  Redefining virality in less broad strokes: Predicting viral behavioral intentions from motivations and uses of Facebook and Twitter , 2015, New Media Soc..

[18]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[19]  Erik Marchi,et al.  Enhancing Multilingual Recognition of Emotion in Speech by Language Identification , 2016, INTERSPEECH.

[20]  J. M. Kittross The measurement of meaning , 1959 .

[21]  Björn W. Schuller,et al.  Stacked denoising autoencoders for sentiment analysis: a review , 2017, WIREs Data Mining Knowl. Discov..

[22]  German Rigau,et al.  Book Reviews: EuroWordNet: A Multilingual Database with Lexical Semantic Networks , 1999, CL.

[23]  Fabien Ringeval,et al.  AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge , 2017, AVEC@ACM Multimedia.

[24]  Søren Holdt Jensen,et al.  Using Audio-Derived Affective Offset to Enhance TV Recommendation , 2014, IEEE Transactions on Multimedia.

[25]  J. Fernando Sánchez-Rada,et al.  Onyx: A Linked Data approach to emotion representation , 2016, Inf. Process. Manag..

[26]  Arthur A. Raney,et al.  Entertainment as Pleasurable and Meaningful: Identifying Hedonic and Eudaimonic Motivations for Entertainment Consumption , 2011 .

[27]  Ian D. Wood,et al.  Emoji as Emotion Tags for Tweets , 2016 .

[28]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[29]  Björn W. Schuller,et al.  Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  J. Fernando Sánchez-Rada,et al.  Multimodal multimodel emotion analysis as linked data , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW).

[31]  Udo Hahn,et al.  EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis , 2017, EACL.

[32]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[33]  Joemon M. Jose,et al.  Integrating facial expressions into user profiling for the improvement of a multimodal recommender system , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[34]  Björn W. Schuller,et al.  Sentiment analysis and opinion mining: on optimal parameters and performances , 2015, WIREs Data Mining Knowl. Discov..

[35]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[36]  Florian Metze,et al.  Robust audio-codebooks for large-scale event detection in consumer videos , 2013, INTERSPEECH.

[37]  Eduardo Coutinho,et al.  Enhanced semi-supervised learning for multimodal emotion recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[39]  Fabien Ringeval,et al.  Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models , 2017, IEEE Transactions on Affective Computing.

[40]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[41]  Jorma Laaksonen,et al.  Content-Based Prediction of Movie Style, Aesthetics, and Affect: Data Set and Baseline Experiments , 2014, IEEE Transactions on Multimedia.

[42]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[43]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[44]  Lung-Hao Lee,et al.  Building Chinese Affective Resources in Valence-Arousal Dimensions , 2016, NAACL.

[45]  Pavel Matejka,et al.  Investigation of Bottle-Neck Features for Emotion Recognition , 2016, TSD.

[46]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[47]  James O'Neill,et al.  NUIG at EmoInt-2017: BiLSTM and SVR Ensemble to Detect Emotion Intensity , 2017, WASSA@EMNLP.

[48]  Michael Gamon,et al.  Predicting Responses to Microblog Posts , 2012, NAACL.

[49]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[50]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Huan Liu,et al.  Mining social media with social theories: a survey , 2014, SKDD.

[52]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[53]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[54]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  C. Hoede Modelling knowledge in Electronic Study Books , 1992 .

[56]  Xuelong Li,et al.  Overlapping Community Detection for Multimedia Social Networks , 2017, IEEE Transactions on Multimedia.

[57]  J. Russell,et al.  The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology , 2005, Development and Psychopathology.

[58]  Preslav Nakov,et al.  SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[59]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[60]  Jurij F. Tasic,et al.  Affective Labeling in a Content-Based Recommender System for Images , 2013, IEEE Transactions on Multimedia.

[61]  Ron Tamborini,et al.  Testing a Dual-Process Model of Media Enjoyment and Appreciation , 2014 .

[62]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[63]  Erik Marchi,et al.  Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification , 2016, LREC.

[64]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[65]  Julio Villena-Román,et al.  Overview of TASS 2015 , 2015, TASS@SEPLN.

[66]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[67]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[68]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[69]  Maja Pantic,et al.  The SEMAINE corpus of emotionally coloured character interactions , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[70]  Ira Kemelmacher-Shlizerman,et al.  The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Paul Buitelaar,et al.  A Study of Suggestions in Opinionated Texts and their Automatic Detection , 2016, *SEMEVAL.

[72]  Panos Vassiliadis,et al.  A Survey of Extract-Transform-Load Technology , 2009, Int. J. Data Warehous. Min..

[73]  Jason Baldridge,et al.  Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph , 2011, ULNLP@EMNLP.

[74]  J. Fernando Sánchez-Rada,et al.  A Linked Data Model for Multimodal Sentiment and Emotion Analysis , 2015, LDL@IJCNLP.

[75]  Lyle H. Ungar,et al.  Modelling Valence and Arousal in Facebook posts , 2016, WASSA@NAACL-HLT.

[76]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[78]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[79]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[80]  Christian Gütl,et al.  On using JSON-LD to create evolvable RESTful services , 2012, WS-REST.

[81]  D. Tufis,et al.  BalkaNet : Aims , Methods , Results and Perspectives . A General Overview , 2004 .

[82]  K. Scherer,et al.  The World of Emotions is not Two-Dimensional , 2007, Psychological science.

[83]  Jeffrey T. Hancock,et al.  Experimental evidence of massive-scale emotional contagion through social networks , 2014, Proceedings of the National Academy of Sciences.

[84]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[85]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[86]  Daniel Gatica-Perez,et al.  Modeling Flickr Communities Through Probabilistic Topic-Based Analysis , 2010, IEEE Transactions on Multimedia.

[87]  Paul Buitelaar,et al.  Expanding wordnets to new languages with multilingual sense disambiguation , 2016, COLING.

[88]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[89]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[90]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[91]  Paul Buitelaar,et al.  A Comparison Of Emotion Annotation Schemes And A New Annotated Data Set , 2018, LREC.

[92]  John P. McCrae,et al.  Toward a truly multilingual GlobalWordnet Grid , 2016, GWC.

[93]  Wei Hu,et al.  Mutually Enhancing Community Detection and Sentiment Analysis on Twitter Networks , 2013 .

[94]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[95]  Carlos Angel Iglesias,et al.  Linked Opinions: Describing Sentiments on the Structured Web of Data , 2011, SDoW@ISWC.

[96]  Björn W. Schuller,et al.  Cross-language acoustic emotion recognition: An overview and some tendencies , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[97]  Erik Wilde,et al.  URI Fragment Identifiers for the text/plain Media Type , 2008, RFC.

[98]  Erik Wilde,et al.  URI Fragment Identifiers for the text/csv Media Type , 2008, RFC.

[99]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[100]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[101]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[102]  Fabien Ringeval,et al.  At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech , 2016, INTERSPEECH.

[103]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[104]  Saif Mohammad,et al.  WASSA-2017 Shared Task on Emotion Intensity , 2017, WASSA@EMNLP.

[105]  Chong-Wah Ngo,et al.  Deep Multimodal Learning for Affective Analysis and Retrieval , 2015, IEEE Transactions on Multimedia.

[106]  M. De Domenico,et al.  The Anatomy of a Scientific Rumor , 2013, Scientific Reports.

[107]  J. Russell Core affect and the psychological construction of emotion. , 2003, Psychological review.