Affective Video Content Analysis With Adaptive Fusion Recurrent Network

Affective video content analysis is an important research topic in video content analysis and has extensive applications. Intuitively, multimodal features can depict elicited emotions, and the accumulation of temporal inputs influences the viewer's emotion. Although a number of research works have been proposed for this task, the adaptive weights of modalities and the correlation of temporal inputs are still not well studied. To address these issues, a novel framework is designed to learn the weights of modalities and temporal inputs from video data. Specifically, three network layers are designed, including statistical-data layer to improve the robustness of data, temporal-adaptive-fusion layer to fuse temporal inputs, and multimodal-adaptive-fusion layer to combine multiple modalities. In particular, the feature vectors of three input modalities are respectively extracted from three pre-trained convolutional neural networks and then fed to three statistical-data layers. Then, the output vectors of these three statistical-data layers are separately connected to three recurrent layers, and the corresponding outputs are fed to a fully-connected layer which shares parameters across modalities and temporal inputs. Finally, the outputs of the fully-connected layer are fused by the temporal-adaptive-fusion layer and then combined by the multimodal-adaptive-fusion layer. To discover the correlation of both multiple modalities and temporal inputs, adaptive weights of modalities and temporal inputs are introduced into loss functions for model training, and these weights are learned by an optimization algorithm. Extensive experiments are conducted on two challenging datasets, which demonstrate that the proposed method achieves better performances than baseline and other state-of-the-art methods.

[1]  Emmanuel Dellandréa,et al.  Affective Video Content Analysis: A Multidisciplinary Insight , 2018, IEEE Transactions on Affective Computing.

[2]  Mingxing Xu,et al.  THU-HCSI at MediaEval 2016: Emotional Impact of Movies Task , 2016, MediaEval.

[3]  Yaxin Wang,et al.  Exploring Domain Knowledge for Affective Video Content Analyses , 2017, ACM Multimedia.

[4]  Yan Liu,et al.  Mining Emotional Features of Movies , 2016, MediaEval.

[5]  Fan Zhang,et al.  BUL in MediaEval 2016 Emotional Impact of Movies Task , 2016, MediaEval.

[6]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Emmanuel Dellandréa,et al.  The MediaEval 2016 Emotional Impact of Movies Task , 2016, MediaEval.

[9]  Hanli Wang,et al.  Multi-modal learning for affective content analysis in movies , 2018, Multimedia Tools and Applications.

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Harish Katti,et al.  CAVVA: Computational Affective Video-in-Video Advertising , 2014, IEEE Transactions on Multimedia.

[13]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[14]  The ICL-TUM-PASSAU Approach for the MediaEval 2015 "Affective Impact of Movies" Task , 2015, MediaEval.

[15]  Qin Jin,et al.  RUC at MediaEval 2016 Emotional Impact of Movies Task: Fusion of Multimodal Features , 2016, MediaEval.

[16]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[17]  Chen Chen,et al.  Emotion in Context: Deep Semantic Feature Fusion for Video Emotion Recognition , 2016, ACM Multimedia.

[18]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[19]  Zhihong Zeng,et al.  Audio-Visual Affect Recognition , 2007, IEEE Transactions on Multimedia.

[20]  Leontios J. Hadjileontiadis,et al.  AUTH-SGP in MediaEval 2016 Emotional Impact of Movies Task , 2016, MediaEval.

[21]  Rainer Stiefelhagen,et al.  KIT at MediaEval 2015 - Evaluating Visual Cues for Affective Impact of Movies Task , 2015, MediaEval.

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  Chong-Wah Ngo,et al.  Deep Multimodal Learning for Affective Analysis and Retrieval , 2015, IEEE Transactions on Multimedia.

[24]  Vu Lam,et al.  NII-UIT at MediaEval 2015 Affective Impact of Movies Task , 2015, MediaEval.

[25]  Xiangyang Xue,et al.  Predicting Emotions in User-Generated Videos , 2014, AAAI.

[26]  Kostas Karpouzis,et al.  The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[27]  Bowen Zhang,et al.  Learning correlations for human action recognition in videos , 2017, Multimedia Tools and Applications.

[28]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[29]  Shiliang Zhang,et al.  Affective Visualization and Retrieval for Music Video , 2010, IEEE Transactions on Multimedia.

[30]  Ehtesham Hassan,et al.  TCS-ILAB - MediaEval 2015: Affective Impact of Movies and Violent Scene Detection , 2015, MediaEval.

[31]  Thierry Dutoit,et al.  UMons at MediaEval 2015 Affective Impact of Movies Task including Violent Scenes Detection , 2015, MediaEval.

[32]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[33]  Bowen Zhang,et al.  MIC-TJU in MediaEval 2015 Affective Impact of Movies Task , 2015, MediaEval.

[34]  T. Dalgleish Basic Emotions , 2004 .

[35]  Qiang Ji,et al.  A Multimodal Deep Regression Bayesian Network for Affective Video Content Analyses , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[37]  Markus Schedl,et al.  RFA at MediaEval 2015 Affective Impact of Movies Task: A Multimodal Approach , 2015, MediaEval.

[38]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Ling-Yu Duan,et al.  Hierarchical movie affective content analysis based on arousal and valence features , 2008, ACM Multimedia.

[40]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  A. Schaefer,et al.  Please Scroll down for Article Cognition & Emotion Assessing the Effectiveness of a Large Database of Emotion-eliciting Films: a New Tool for Emotion Researchers , 2022 .

[43]  Emmanuel Dellandréa,et al.  The MediaEval 2015 Affective Impact of Movies Task , 2015, MediaEval.

[44]  Xi Wang,et al.  Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning , 2015, MediaEval.

[45]  Nando de Freitas,et al.  Cortical microcircuits as gated-recurrent neural networks , 2017, NIPS.

[46]  Thierry Pun,et al.  DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[47]  Emmanuel Dellandréa,et al.  LIRIS-ACCEDE: A Video Database for Affective Content Analysis , 2015, IEEE Transactions on Affective Computing.

[48]  Qiang Ji,et al.  Video Affective Content Analysis: A Survey of State-of-the-Art Methods , 2015, IEEE Transactions on Affective Computing.

[49]  Mohammad Soleymani,et al.  A Multimodal Database for Affect Recognition and Implicit Tagging , 2012, IEEE Transactions on Affective Computing.

[50]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[51]  Alberto Del Bimbo,et al.  Deep Sentiment Features of Context and Faces for Affective Video Analysis , 2017, ICMR.

[52]  W. McD. Grundzüge der physiologischen Psychologie , 1902, Nature.