Hybrid models for opinion analysis in speech interactions

Sentiment analysis is a trendy domain of Machine Learning which has developed considerably in the last several years. Nevertheless, most of the sentiment analysis systems are general. They do not take profit of the interactional context and all of the possibilities that it brings. My PhD thesis focuses on creating a system that uses the information transmitted between two speakers in order to analyze opinion inside a human-human or a human-agent interaction. This paper outlines a research plan for investigating a system that analyzes opinion in speech interactions, using hybrid discriminative models. We present the state of the art in our domain, then we discuss our prior research in the area and the preliminary results we obtained. Finally, we conclude with the future perspectives we want to explore during the rest of this PhD work.

[1]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[2]  John Kane,et al.  COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Chloé Clavel,et al.  Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields , 2017, INTERSPEECH.

[4]  Marie Tahon,et al.  Towards a Small Set of Robust Acoustic Features for Emotion Recognition: Challenges , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[6]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[7]  Claire Cardie,et al.  Joint Inference for Fine-grained Opinion Extraction , 2013, ACL.

[8]  Maja Pantic,et al.  Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition , 2011, Face and Gesture 2011.

[9]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[10]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.

[11]  George Trigeorgis,et al.  Automatically Estimating Emotion in Music with Deep Long-Short Term Memory Recurrent Neural Networks , 2015, MediaEval.

[12]  Thomas Pellegrini,et al.  Time-continuous Estimation of Emotion in Music with Recurrent Neural Networks , 2015, MediaEval.

[13]  Shafiq R. Joty,et al.  Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings , 2015, EMNLP.

[14]  Shrikanth S. Narayanan,et al.  Detecting emotional state of a child in a conversational computer game , 2011, Comput. Speech Lang..

[15]  Louis-Philippe Morency,et al.  Representation Learning for Speech Emotion Recognition , 2016, INTERSPEECH.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[18]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[19]  Verónica Pérez-Rosas,et al.  Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[20]  Björn W. Schuller,et al.  YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context , 2013, IEEE Intelligent Systems.

[21]  Rada Mihalcea,et al.  Towards multimodal sentiment analysis: harvesting opinions from the web , 2011, ICMI '11.

[22]  Louis-Philippe Morency,et al.  Learning Representations of Affect from Speech , 2015, ICLR 2015.

[23]  Zoraida Callejas Carrión,et al.  Sentiment Analysis: From Opinion Mining to Human-Agent Interaction , 2016, IEEE Transactions on Affective Computing.

[24]  Chloé Clavel,et al.  Improving social relationships in face-to-face human-agent interactions: when the agent wants to know user's likes and dislikes , 2015, ACL.

[25]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[26]  Jean-Philippe Thiran,et al.  Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data , 2015, Pattern Recognit. Lett..

[27]  Verónica Pérez-Rosas,et al.  Multimodal Sentiment Analysis of Spanish Online Videos , 2013, IEEE Intelligent Systems.

[28]  Björn Schuller,et al.  Strength modelling for real-worldautomatic continuous affect recognition from audiovisual signals , 2017, Image Vis. Comput..

[29]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[30]  Loïc Kessous,et al.  The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals , 2007, INTERSPEECH.

[31]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[32]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..