A knowledge transfer and boosting approach to the prediction of affect in movies

Affect prediction is a classical problem and has recently garnered special interest in multimedia applications. Affect prediction in movies is one such domain, potentially aiding the design as well as the impact analysis of movies. Given the large diversity in movies (such as different genres and languages), obtaining a comprehensive movie dataset for modeling affect is challenging while models trained on smaller datasets may not generalize. In this paper, we address the problem of continuous affect ratings with the availability of limited in-domain data resources. We initially setup several baseline models trained on in-domain data, followed by a proposal of a Knowledge Transfer (KT) + Gradient Boosting (GB) approach. KT learns models on a larger (mismatched) data which are then adapted to make predictions on the data of interest. GB further updates these predictions based on models learnt from the in-domain data. We observe that the KT + GB models provide Concordance Correlation Coefficient values of 0.13 and 0.27 for valence and affect prediction on the continuous LIRIS ACCEDE dataset against best baseline prediction values of 0.12 and 0.11. Not only the KT + GB models improve the overall performance metrics, we also observe a more consistent model performance across movies of various genres.

[1]  Chong-Wah Ngo,et al.  Mutlimodal Learning with Deep Boltzmann Machine for Emotion Prediction in User Generated Videos , 2015, ICMR.

[2]  Xiangyang Xue,et al.  Predicting Emotions in User-Generated Videos , 2014, AAAI.

[3]  John Hannah,et al.  IEEE International Conference on Image Processing (ICIP) , 1997 .

[4]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[5]  Riccardo Leonardi,et al.  Emotional identity of movies , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[6]  Sergios Theodoridis,et al.  A dimensional approach to emotion recognition of speech from movies , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Rahul Gupta,et al.  Affect prediction in music using boosted ensemble of filters , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[8]  Emmanuel Dellandréa,et al.  Deep learning vs. kernel methods: Performance for emotion prediction in videos , 2015, ACII.

[9]  Markus Appel,et al.  Predicting Emotions and Meta-Emotions at the Movies: The Role of the Need for Affect in Audiences’ Experience of Horror and Drama , 2010, Commun. Res..

[10]  Carlos Busso,et al.  Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators , 2015, IEEE Transactions on Affective Computing.

[11]  Anne Bartsch,et al.  Emotional Gratification in Entertainment Experience. Why Viewers of Movies and Television Series Find it Rewarding to Experience Emotions , 2012 .

[12]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[13]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[14]  Urbashi Mitra,et al.  2008 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , 2008 .

[15]  Li-Yun Wang,et al.  Violence Detection in Movies , 2011, 2011 Eighth International Conference Computer Graphics, Imaging and Visualization.

[16]  Joost Broekens,et al.  Real time labeling of affect in music using the affectbutton , 2010, AFFINE '10.

[17]  Emmanuel Dellandréa,et al.  The MediaEval 2015 Affective Impact of Movies Task , 2015, MediaEval.

[18]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[19]  Rahul Gupta,et al.  Analysis of engagement behavior in children during dyadic interactions using prosodic cues , 2016, Comput. Speech Lang..

[20]  Emmanuel Dellandréa,et al.  LIRIS-ACCEDE: A Video Database for Affective Content Analysis , 2015, IEEE Transactions on Affective Computing.

[21]  Mike E. Davies,et al.  IEEE International Conference on Acoustics Speech and Signal Processing , 2008 .

[22]  Tanaya Guha,et al.  A multimodal mixture-of-experts model for dynamic emotion prediction in movies , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[24]  Ling-Yu Duan,et al.  Hierarchical movie affective content analysis based on arousal and valence features , 2008, ACM Multimedia.

[25]  Mohammad Soleymani,et al.  Affective ranking of movie scenes using physiological signals and content analysis , 2008, MS '08.

[26]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[27]  Athanasia Zlatintsi,et al.  A supervised approach to movie emotion tracking , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Rahul Gupta,et al.  Online Affect Tracking with Multimodal Kalman Filters , 2016, AVEC@ACM Multimedia.

[29]  Alex Zelinsky,et al.  Learning OpenCV---Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008)[On the Shelf] , 2009, IEEE Robotics & Automation Magazine.

[30]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[31]  Noël Carroll Movies, the Moral Emotions, and Sympathy , 2010 .

[32]  Qiang Yang,et al.  Can chinese web pages be classified with english data source? , 2008, WWW.

[33]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[34]  J. W. Lewis,et al.  A note on concordance correlation coefficient. , 2000, PDA journal of pharmaceutical science and technology.