Movie Emotion Estimation with Multimodal Fusion and Synthetic Data Generation

In this work, we propose a method for automatic emotion recognition from movie clips. This problem has applications in indexing and retrieval of large movie and video collections, summarization of visual content, selection of emotioninvoking materials, and such. Our approach aims to estimate valence and arousal values automatically. We extract audio and visual features, summarize them via functionals, PCA, and Fisher vector encoding approaches. We used feature selection based on canonical correlation analysis. For classification, we used extreme learning machine and support vector machine. We tested our approach on the LIRIS-ACCEDE database with ground truth annotations. The class imbalance problem was solved by generating synthetic data. By fusing the best features at score and feature level, we obtain good results on this problem, especially for the valence prediction.

[1]  Greg M. Smith Film Structure and the Emotion System , 2003 .

[2]  Leontios J. Hadjileontiadis,et al.  AUTH-SGP in MediaEval 2016 Emotional Impact of Movies Task , 2016, MediaEval.

[3]  Frank Hopfgartner,et al.  A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material , 2016, Multimedia Tools and Applications.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[6]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Mohammad Soleymani,et al.  Large-scale Affective Content Analysis: Combining Media Content Features and Facial Reactions , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[8]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[9]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Greg M. Smith,et al.  Passionate views : film, cognition, and emotion , 1999 .

[11]  Albert Ali Salah,et al.  Random Discriminative Projection Based Feature Selection with Application to Conflict Recognition , 2015, IEEE Signal Processing Letters.

[12]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Mohammad Soleymani,et al.  Affective Characterization of Movie Scenes Based on Multimedia Content Analysis and User's Physiological Emotional Responses , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Albert Ali Salah,et al.  Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips , 2018, ICMR.

[16]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[17]  Emmanuel Dellandréa,et al.  The MediaEval 2016 Emotional Impact of Movies Task , 2016, MediaEval.

[18]  Albert Ali Salah,et al.  BOUN-NKU in MediaEval 2017 Emotional Impact of Movies Task , 2017, MediaEval.

[19]  Guillaume Chanel,et al.  Recognizing Induced Emotions of Movie Audiences from Multimodal Information , 2019, IEEE Transactions on Affective Computing.