A unified framework of deep networks for genre classification using movie trailer

Abstract Affective video content analysis has emerged as one of the most challenging and essential research tasks as it aims to analyze the emotions elicited by videos automatically. However, little progress has been achieved in this field due to the enigmatic nature of emotions. This widens the gap between the human affective state and the structure of the video. In this paper, we propose a novel deep affect-based movie trailer classification framework. We also develop an EmoGDB dataset, which contains 100 Bollywood movie trailers annotated with popular movie genres: Action, Comedy, Drama, Horror, Romance, Thriller, and six different types of induced emotions: Anger, Fear, Happy, Neutral, Sad, Surprise. The affect-based features are learned via ILDNet architecture trained on the EmoGDB dataset. Our work aims to analyze the relationship between the emotions elicited by the movie trailers and how they contribute in solving the multi-label genre classification problem. The proposed novel framework is validated by performing cross-dataset testing on three large scale datasets, namely LMTD-9, MMTF-14K, and ML-25M datasets. Extensive experiments show that the proposed algorithm outperforms all the state-of-the-art methods significantly, as reported by the precision, recall, F1 score, precision–recall curves (PRC), and area under the PRC evaluation metrics.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Liang-Hua Chen,et al.  Movie scene segmentation using background information , 2008, Pattern Recognit..

[3]  Hanli Wang,et al.  Affective Video Content Analysis With Adaptive Fusion Recurrent Network , 2020, IEEE Transactions on Multimedia.

[4]  Rodrigo C. Barros,et al.  Movie genre classification: A multi-label approach based on convolutions through time , 2017, Appl. Soft Comput..

[5]  Sajal K. Das,et al.  A novel feature set for video emotion recognition , 2018, Neurocomputing.

[6]  Xintao Hu,et al.  Predicting Movie Trailer Viewer's “Like/Dislike” via Learned Shot Editing Patterns , 2016, IEEE Transactions on Affective Computing.

[7]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[8]  Binsu C. Kovoor,et al.  Towards genre-specific frameworks for video summarisation: A survey , 2019, J. Vis. Commun. Image Represent..

[9]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Dinesh Kumar Vishwakarma,et al.  Sentiment analysis using deep learning architectures: a review , 2019, Artificial Intelligence Review.

[11]  Nicu Sebe,et al.  Movie Genre Classification by Exploiting MEG Brain Signals , 2015, ICIAP.

[12]  Kiyoharu Aizawa,et al.  Affective Audio-Visual Words and Latent Topic Driving Model for Realizing Movie Affective Scene Classification , 2010, IEEE Transactions on Multimedia.

[13]  Riccardo Leonardi,et al.  Affective Recommendation of Movies Based on Selected Connotative Features , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  R. Simons,et al.  Roll ‘em!: The effects of picture motion on emotional responses , 1998 .

[15]  Ekaterina Vylomova,et al.  Depth-Gated LSTM , 2015, ArXiv.

[16]  Quan Huynh-Thu,et al.  Physiological-Based Affect Event Detector for Entertainment Video Applications , 2012, IEEE Transactions on Affective Computing.

[17]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[18]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[19]  Erik Cambria,et al.  The Hourglass of Emotions , 2011, COST 2102 Training School.

[20]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[21]  Youngbin Kim,et al.  Poster-Based Multiple Movie Genre Classification Using Inter-Channel Features , 2020, IEEE Access.

[22]  Boyang Li,et al.  A Multi-Task Neural Approach for Emotion Attribution, Classification, and Summarization , 2018, IEEE Transactions on Multimedia.

[23]  Xiangjian He,et al.  Hierarchical affective content analysis in arousal and valence dimensions , 2013, Signal Process..

[24]  Sergio Escalera,et al.  Audio-Visual Emotion Recognition in Video Clips , 2019, IEEE Transactions on Affective Computing.

[25]  Loong Fah Cheong,et al.  Affective understanding in film , 2006, IEEE Trans. Circuits Syst. Video Technol..

[26]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[27]  J. Russell A circumplex model of affect. , 1980 .

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Xiangjian He,et al.  A three-level framework for affective content analysis and its case studies , 2014, Multimedia Tools and Applications.

[30]  Qiang Ji,et al.  Video Affective Content Analysis: A Survey of State-of-the-Art Methods , 2015, IEEE Transactions on Affective Computing.

[31]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[32]  Byron Reeves,et al.  Negative video as structure: Emotion, attention, capacity, and memory , 1996 .

[33]  P. Ekman An argument for basic emotions , 1992 .

[34]  Daniel McDuff,et al.  Predicting Ad Liking and Purchase Intent: Large-Scale Analysis of Facial Responses to Ads , 2014, IEEE Transactions on Affective Computing.

[35]  Greg M. Smith Film Structure and the Emotion System , 2003 .

[36]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[37]  Yaser Sheikh,et al.  On the use of computable features for film classification , 2005 .

[38]  Dinesh Kumar Vishwakarma,et al.  A comparative study on bio-inspired algorithms for sentiment analysis , 2020, Cluster Computing.

[39]  A. Ortony,et al.  What's basic about basic emotions? , 1990, Psychological review.

[40]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[41]  Abhishek Das,et al.  Grad-CAM: Why did you say that? , 2016, ArXiv.