Predicting Evoked Emotions in Video

Understanding how human emotion is evoked from visual content is a task that we as people do every day, but machines have not yet mastered. In this work we address the problem of predicting the intended evoked emotion at given points within movie trailers. Movie Trailers are carefully curated to elicit distinct and specific emotional responses from viewers, and are therefore well-suited for emotion prediction. However, current emotion recognition systems struggle to bridge the "affective gap", which refers to the difficulty in modeling high-level human emotions with low-level audio and visual features. To address this problem, we propose a mid-level concept feature, which is based on detectable movie shot concepts which we believe to be tied closely to emotions. Examples of these concepts are "Fight", "Rock Music", and "Kiss". We also create 2 datasets, the first with shot-level concept annotations for learning our concept detectors, and a separate, second dataset with emotion annotations taken throughout the trailers using the two dimensional arousal and valence model for emotion annotation. We report the performance of our concept detectors, and show that by using the output of these detectors as a mid-level representation for the movie shots we are able to more accurately predict the evoked emotion throughout a trailer than by using low-level features.

[1]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.

[2]  Xiangyang Xue,et al.  Predicting Emotions in User-Generated Videos , 2014, AAAI.

[3]  Geoff Hulten,et al.  Measuring the engagement level of TV viewers , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[4]  J. M. Kittross The measurement of meaning , 1959 .

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Riccardo Leonardi,et al.  A Connotative Space for Supporting Movie Affective Recommendation , 2011, IEEE Transactions on Multimedia.

[8]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[9]  P. Lang International Affective Picture System (IAPS) : Technical Manual and Affective Ratings , 1995 .

[10]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[11]  Riccardo Leonardi,et al.  Affective Recommendation of Movies Based on Selected Connotative Features , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Thierry Pun,et al.  DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[14]  Thierry Pun,et al.  Multimodal Emotion Recognition in Response to Videos , 2012, IEEE Transactions on Affective Computing.

[15]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[16]  Tamás D. Gedeon,et al.  Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary , 2013, ICMI '13.

[17]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[18]  Loong Fah Cheong,et al.  Affective understanding in film , 2006, IEEE Trans. Circuits Syst. Video Technol..

[19]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[20]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[21]  Sridha Sridharan,et al.  Predicting movie ratings from audience behaviors , 2014, IEEE Winter Conference on Applications of Computer Vision.

[22]  A. Hanjalic,et al.  Extracting moods from pictures and sounds: towards truly personalized TV , 2006, IEEE Signal Processing Magazine.

[23]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[24]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..