Predicting Emotions in User-Generated Videos

User-generated video collections are expanding rapidly in recent years, and systems for automatic analysis of these collections are in high demands. While extensive research efforts have been devoted to recognizing semantics like "birthday party" and "skiing", little attempts have been made to understand the emotions carried by the videos, e.g., "joy" and "sadness". In this paper, we propose a comprehensive computational framework for predicting emotions in user-generated videos. We first introduce a rigorously designed dataset collected from popular video-sharing websites with manual annotations, which can serve as a valuable benchmark for future research. A large set of features are extracted from this dataset, ranging from popular low-level visual descriptors, audio features, to high-level semantic attributes. Results of a comprehensive set of experiments indicate that combining multiple types of features--such as the joint use of the audio and visual clues--is important, and attribute features such as those containing sentiment-level semantics are very effective.

[1]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[3]  Xiangyang Xue,et al.  Understanding and Predicting Interestingness of Videos , 2013, AAAI.

[4]  C. W. Hughes Emotion: Theory, Research and Experience , 1982 .

[5]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[7]  James Ze Wang,et al.  On shape and the computability of emotions , 2012, ACM Multimedia.

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Nicu Sebe,et al.  Affective Meeting Video Analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[10]  He Yin,et al.  Affective sports highlight detection , 2007, 2007 15th European Signal Processing Conference.

[11]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Emmanuel Dellandréa,et al.  A Large Video Database for Computational Models of Induced Emotion , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[13]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[14]  Hang-Bong Kang,et al.  Affective content detection using HMMs , 2003, ACM Multimedia.

[15]  Yaser Sheikh,et al.  On the use of computable features for film classification , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Nicu Sebe,et al.  Emotional valence categorization using holistic image features , 2008, 2008 15th IEEE International Conference on Image Processing.

[17]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[19]  Kiyoharu Aizawa,et al.  Determination of emotional content of video clips by low-level audiovisual features , 2011, Multimedia Tools and Applications.

[20]  Loong Fah Cheong,et al.  Affective understanding in film , 2006, IEEE Trans. Circuits Syst. Video Technol..

[21]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  T. Stanford,et al.  Multisensory integration: current issues from the perspective of the single neuron , 2008, Nature Reviews Neuroscience.

[25]  P. Lang International affective picture system (IAPS) : affective ratings of pictures and instruction manual , 2005 .

[26]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[27]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[28]  Rongrong Ji,et al.  SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content , 2013, ACM Multimedia.

[29]  Xiangjian He,et al.  Hierarchical affective content analysis in arousal and valence dimensions , 2013, Signal Process..