Multi-Modal Sentiment Classification With Independent and Interactive Knowledge via Semi-Supervised Learning

Multi-modal sentiment analysis extends conventional text-based definition of sentiment analysis to a multi-modal setup where multiple relevant modalities are leveraged to perform sentiment analysis. In real applications, however, acquiring annotated multi-modal data is normally labor expensive and time-consuming. In this paper, we aim to reduce the annotation effort for multi-modal sentiment classification via semi-supervised learning. The key idea is to leverage the semi-supervised variational autoencoders to mine more information from unlabeled data for multi-modal sentiment analysis. Specifically, the mined information includes both the independent knowledge within single modality and the interactive knowledge among different modalities. Empirical evaluation demonstrates the great effectiveness of the proposed semi-supervised approach to multi-modal sentiment classification.

[1]  Patrick A. Naylor,et al.  Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[3]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[4]  Moloud Abdar,et al.  A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition , 2019, IEEE Access.

[5]  Louis-Philippe Morency,et al.  Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages , 2016, IEEE Intelligent Systems.

[6]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[7]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[8]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Nannan Wang,et al.  Semi-Supervised Classification With Graph Structure Similarity and Extended Label Propagation , 2019, IEEE Access.

[11]  Wei Liu,et al.  Multi-Modal Curriculum Learning for Semi-Supervised Image Classification , 2016, IEEE Transactions on Image Processing.

[12]  Mark Liberman,et al.  Speaker identification on the SCOTUS corpus , 2008 .

[13]  Lei Huang,et al.  Semi-Stacking for Semi-supervised Sentiment Classification , 2015, ACL.

[14]  Eric P. Xing,et al.  Select-additive learning: Improving generalization in multimodal sentiment analysis , 2016, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[15]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[16]  John Kane,et al.  COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Larry P. Heck,et al.  Contextual LSTM (CLSTM) models for Large scale NLP tasks , 2016, ArXiv.

[18]  Fan Zhao,et al.  Dynamic graph fusion label propagation for semi-supervised multi-modality classification , 2017, Pattern Recognit..

[19]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[20]  John Kane,et al.  Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Kenneth Li-Minn Ang,et al.  Multimodal Emotion and Sentiment Modeling From Unstructured Big Data: Challenges, Architecture, & Techniques , 2019, IEEE Access.

[22]  Cheng Wang,et al.  Co-training for Semi-supervised Sentiment Classification Based on Dual-view Bags-of-words Representation , 2015, ACL.

[23]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[24]  Erik Cambria,et al.  Multi-attention Recurrent Network for Human Communication Comprehension , 2018, AAAI.

[25]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[26]  Ivan Marsic,et al.  Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment , 2018, ACL.

[27]  Louis-Philippe Morency,et al.  MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos , 2016, ArXiv.

[28]  Wenzhong Guo,et al.  Deep Multimodal Representation Learning: A Survey , 2019, IEEE Access.

[29]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[30]  Erik Cambria,et al.  Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.

[31]  Eduardo Coutinho,et al.  Enhanced semi-supervised learning for multimodal emotion recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[33]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[34]  J. Skilling Bayesian Methods in Cosmology: Foundations and algorithms , 2009 .

[35]  Erik Cambria,et al.  Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.