Estimation of Emotions Evoked by Images Based on Multiple Gaze-based CNN Features

This paper presents a method for estimating emotions evoked by watching images based on multiple visual features considering relationship with gaze information. The proposed method obtains multiple visual features from multiple middle layers of a Convolutional Neural Network. Then the proposed method newly derives their gaze-based visual features maximizing correlation with gaze information by using Discriminative Locality Preserving Canonical Correlation Analysis. The final estimation result is calculated by integrating multiple estimation results obtained from these gaze-based visual features. Consequently, successful emotion estimation becomes feasible by using such multiple estimation results which correspond to different semantic levels of target images.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Bernt Schiele,et al.  Gaze Embeddings for Zero-Shot Image Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Mohan S. Kankanhalli,et al.  Emotional Attention: A Study of Image Sentiment and Visual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Long Lan,et al.  Discriminative Locality Preserving Canonical Correlation Analysis , 2012, CCPR.

[6]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.