Zero-Shot Emotion Recognition via Affective Structural Embedding

Image emotion recognition attracts much attention in recent years due to its wide applications. It aims to classify the emotional response of humans, where candidate emotion categories are generally defined by specific psychological theories, such as Ekman’s six basic emotions. However, with the development of psychological theories, emotion categories become increasingly diverse, fine-grained, and difficult to collect samples. In this paper, we investigate zero-shot learning (ZSL) problem in the emotion recognition task, which tries to recognize the new unseen emotions. Specifically, we propose a novel affective-structural embedding framework, utilizing mid-level semantic representation, i.e., adjective-noun pairs (ANP) features, to construct an affective embedding space. By doing this, the learned intermediate space can narrow the semantic gap between low-level visual and high-level semantic features. In addition, we introduce an affective adversarial constraint to retain the discriminative capacity of visual features and the affective structural information of semantic features during training process. Our method is evaluated on five widely used affective datasets and the perimental results show the proposed algorithm outperforms the state-of-the-art approaches.

[1]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Tao Chen,et al.  DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks , 2014, ArXiv.

[3]  Jiebo Luo,et al.  Visual Sentiment Analysis by Attending on Local Image Regions , 2017, AAAI.

[4]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[7]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Subhasis Chaudhuri,et al.  Structure Aligning Discriminative Latent Embedding for Zero-Shot Learning , 2018, BMVC.

[9]  Shuang Wu,et al.  Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yue Gao,et al.  Approximating Discrete Probability Distribution of Image Emotions by Multi-Modal Features Fusion , 2017, IJCAI.

[11]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Debi Prosad Dogra,et al.  Prediction of advertisement preference by fusing EEG response and sentiment analysis , 2017, Neural Networks.

[14]  Jufeng Yang,et al.  Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network , 2017, IJCAI.

[15]  Venkatesh Saligrama,et al.  Zero Shot Detection , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[17]  Kristen A. Lindquist,et al.  The brain basis of emotion: A meta-analytic review , 2012, Behavioral and Brain Sciences.

[18]  Changsheng Xu,et al.  Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis , 2016, ACM Multimedia.

[19]  Yue Gao,et al.  Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression , 2017, IEEE Transactions on Multimedia.

[20]  Yue Gao,et al.  Predicting Personalized Image Emotion Perceptions in Social Networks , 2018, IEEE Transactions on Affective Computing.

[21]  Rongrong Ji,et al.  SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content , 2013, ACM Multimedia.

[22]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[23]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Tsuhan Chen,et al.  A mixed bag of emotions: Model, predict, and transfer emotion distributions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yi Yang,et al.  Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition , 2015, AAAI.

[26]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[27]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Tao Xiang,et al.  Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  P. Ekman An argument for basic emotions , 1992 .

[30]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[31]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Yue Gao,et al.  Real-Time Multimedia Social Event Detection in Microblog , 2018, IEEE Transactions on Cybernetics.

[33]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[34]  Ming-Hsuan Yang,et al.  Retrieving and Classifying Affective Images via Deep Metric Learning , 2018, AAAI.

[35]  Hema A. Murthy,et al.  A Generative Model for Zero Shot Learning Using Conditional Variational Autoencoders , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Sam J. Maglio,et al.  Emotional category data on images from the international affective picture system , 2005, Behavior research methods.

[37]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[38]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39]  L. F. Barrett Are Emotions Natural Kinds? , 2006, Perspectives on psychological science : a journal of the Association for Psychological Science.

[40]  Soma Biswas,et al.  Preserving Semantic Relations for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Yi Yang,et al.  Learning Discriminative Latent Attributes for Zero-Shot Classification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Xi Peng,et al.  A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  P. Ekman,et al.  DIFFERENCES Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion , 2004 .

[46]  Shiguang Shan,et al.  Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition , 2018, ECCV.

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Kai Zhao,et al.  Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Jie Tang,et al.  Can we understand van gogh's mood?: learning to infer affects from images in social networks , 2012, ACM Multimedia.

[50]  Ming-Hsuan Yang,et al.  Weakly Supervised Coupled Networks for Visual Sentiment Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.