Tensor-Based Emotional Category Classification via Visual Attention-Based Heterogeneous CNN Feature Fusion

The paper proposes a method of visual attention-based emotion classification through eye gaze analysis. Concretely, tensor-based emotional category classification via visual attention-based heterogeneous convolutional neural network (CNN) feature fusion is proposed. Based on the relationship between human emotions and changes in visual attention with time, the proposed method performs new gaze-based image representation that is suitable for reflecting the characteristics of the changes in visual attention with time. Furthermore, since emotions evoked in humans are closely related to objects in images, our method uses a CNN model to obtain CNN features that can represent their characteristics. For improving the representation ability to the emotional categories, we extract multiple CNN features from our novel gaze-based image representation and enable their fusion by constructing a novel tensor consisting of these CNN features. Thus, this tensor construction realizes the visual attention-based heterogeneous CNN feature fusion. This is the main contribution of this paper. Finally, by applying logistic tensor regression with general tensor discriminant analysis to the newly constructed tensor, the emotional category classification becomes feasible. Since experimental results show that the proposed method enables the emotional category classification with the F1-measure of approximately 0.6, and about 10% improvement can be realized compared to comparative methods including state-of-the-art methods, the effectiveness of the proposed method is verified.

[1]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[2]  Miki Haseyama,et al.  Estimation of Emotion Labels via Tensor-Based Spatiotemporal Visual Attention Analysis , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[3]  Seong Youb Chung,et al.  EEG-based emotion estimation using Bayesian weighted-log-posterior function and perceptron convergence algorithm , 2013, Comput. Biol. Medicine.

[4]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[5]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Yue Gao,et al.  Approximating Discrete Probability Distribution of Image Emotions by Multi-Modal Features Fusion , 2017, IJCAI.

[7]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[9]  Miki Haseyama,et al.  Multimodal Interest Level Estimation via Variational Bayesian Mixture of Robust CCA , 2016, ACM Multimedia.

[10]  P. Vuilleumier,et al.  How brains beware: neural mechanisms of emotional attention , 2005, Trends in Cognitive Sciences.

[11]  CambriaErik,et al.  A review of affective computing , 2017 .

[12]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[13]  Souad Chaabouni,et al.  Impact of Saliency and Gaze Features on Visual Control: Gaze-Saliency Interest Estimator , 2019, ACM Multimedia.

[14]  Qianhua He,et al.  A survey on emotional semantic image retrieval , 2008, 2008 15th IEEE International Conference on Image Processing.

[15]  Bernt Schiele,et al.  Gaze Embeddings for Zero-Shot Image Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xu Tan,et al.  Logistic Tensor Regression for Classification , 2012, IScIDE.

[17]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[18]  Mohan S. Kankanhalli,et al.  Emotional Attention: A Study of Image Sentiment and Visual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[20]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Qionghai Dai,et al.  Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.

[23]  Rasmus Rothe A deep understanding from a single image , 2016 .

[24]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[25]  Thierry Pun,et al.  Multimodal Emotion Recognition in Response to Videos , 2012, IEEE Transactions on Affective Computing.

[26]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[27]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Sam J. Maglio,et al.  Emotional category data on images from the international affective picture system , 2005, Behavior research methods.

[29]  P. Ekman An argument for basic emotions , 1992 .

[30]  Zhihua Xia,et al.  A Privacy-Preserving and Copy-Deterrence Content-Based Image Retrieval Scheme in Cloud Computing , 2016, IEEE Transactions on Information Forensics and Security.

[31]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[32]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Miki Haseyama,et al.  Emotion estimation via tensor-based supervised decision-level fusion from multiple Brodmann areas , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[36]  Björn W. Schuller,et al.  Using Multiple Databases for Training in Emotion Recognition: To Unite or to Vote? , 2011, INTERSPEECH.

[37]  Masanori Sugimoto,et al.  Using Image Features and Eye Tracking Device to Predict Human Emotions Towards Abstract Images , 2015, PSIVT.

[38]  R. Compton The interface between emotion and attention: a review of evidence from psychology and neuroscience. , 2003, Behavioral and cognitive neuroscience reviews.

[39]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Youbao Tang,et al.  Discrete Probability Distribution Prediction of Image Emotions with Shared Sparse Learning , 2020, IEEE Transactions on Affective Computing.

[41]  Wei Liu,et al.  Multi-view Emotion Recognition Using Deep Canonical Correlation Analysis , 2018, ICONIP.

[42]  Hao Tang,et al.  Emotion Recognition using Multimodal Residual LSTM Network , 2019, ACM Multimedia.

[43]  T. Chau,et al.  Single-trial classification of NIRS signals during emotional induction tasks: towards a corporeal machine interface , 2009, Journal of NeuroEngineering and Rehabilitation.

[44]  Min Xu,et al.  Learning Multi-level Deep Representations for Image Emotion Classification , 2016, Neural Processing Letters.

[45]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[46]  Miki Haseyama,et al.  Selection of Significant Brain Regions Based on MvGTDA and TS-DLF for Emotion Estimation , 2018, IEEE Access.

[47]  Jan P. Allebach,et al.  Learning deep features for image emotion classification , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[48]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[49]  Hatice Gunes,et al.  Affect recognition from face and body: early fusion vs. late fusion , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.