Video Interestingness Prediction Based on Ranking Model

Predicting the interestingness of videos can greatly improve people's satisfactions in many applications such as video retrieval and recommendations. In order to obtain less subjective interestingness annotations, partial pairwise comparisons among videos are firstly annotated and all videos are then ranked globally to generate the interestingness value. We study two factors in interestingness prediction, namely comparison information and evaluation metric optimization. In this paper, we propose a novel deep ranking model which simulates the human annotation procedures for more reliable interestingness prediction. To be specific, we extract different visual and acoustic features and sample different comparison video pairs by different strategies such as random and fixed-distance. The richer information of human pairwise ranking annotations are used as a richer guidance compared with the plain interestingness value to train our networks. In addition to comparison information, we also explore reinforcement ranking model which directly optimizes the evaluation metric. Experimental results demonstrate that the fusion of the two ranking models can make better use of human labels and outperform the regression baseline. Also, it reaches the best performance according to the results of MediaEval 2017 interestingness prediction task.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Mats Sjöberg,et al.  MediaEval 2017 Predicting Media Interestingness Task , 2016, MediaEval.

[4]  Raffay Hamid,et al.  What makes an image popular? , 2014, WWW.

[5]  Bogdan Ionescu,et al.  LAPI at MediaEval 2017 - Predicting Media Interestingness , 2017, MediaEval.

[6]  Zhonglei Gu,et al.  Predicting Media Interestingness via Biased Discriminant Embedding and Supervised Manifold Regression , 2017, MediaEval.

[7]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Guangyou Xu,et al.  Hierarchical indexing scheme for fast search in a large-scale image database , 2003, International Symposium on Multispectral Image Processing and Pattern Recognition.

[9]  Rashi Gupta,et al.  DA-IICT at MediaEval 2017: Objective Prediction of Media Interestingness , 2017, MediaEval.

[10]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[11]  Tao Xiang,et al.  Interestingness Prediction by Robust Learning to Rank , 2014, ECCV.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Hao-Chuan Wang,et al.  Investigating and predicting social and visual image interestingness on social media by crowdsourcing , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Kien A. Hua,et al.  Multi-view Manifold Learning for Media Interestingness Prediction , 2017, ICMR.

[16]  Francis K. H. Quek,et al.  The effect of familiarity on perceived interestingness of images , 2013, Electronic Imaging.

[17]  Benoit Huet,et al.  EURECOM@MediaEval 2017: Media Genre Inference for Predicting Media Interestingness , 2017, MediaEval.

[18]  Claire-Hélène Demarty,et al.  Experiencing the interestingness concept within and between pictures , 2016, HVEI.

[19]  Hatice Gunes,et al.  Automatic analysis of facial attractiveness from video , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[20]  Chong-Wah Ngo,et al.  Learning Query and Image Similarities with Ranking Canonical Correlation Analysis , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[24]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[25]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[26]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Jiafeng Guo,et al.  Reinforcement Learning to Rank with Markov Decision Process , 2017, SIGIR.

[28]  Jurandy Almeida,et al.  GIBIS at MediaEval 2017: Predicting Media Interestingness Task , 2017, MediaEval.

[29]  Jayneel Parekh,et al.  The IITB Predicting Media Interestingness System for MediaEval 2017 , 2017, MediaEval.

[30]  Touradj Ebrahimi,et al.  Multimedia content analysis for emotional characterization of music video clips , 2013, EURASIP J. Image Video Process..

[31]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Luc Van Gool,et al.  Visual interestingness in image sequences , 2013, MM '13.

[33]  Shuai Wang,et al.  RUC at MediaEval 2016: Predicting Media Interestingness Task , 2016, MediaEval.

[34]  Jurandy Almeida,et al.  UNIFESP at MediaEval 2016: Predicting Media Interestingness Task , 2016, MediaEval.

[35]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Sejong Yoon TCNJ-CS@MediaEval 2017 Predicting Media Interestingness Task , 2017, MediaEval.

[37]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[38]  Claire-Hélène Demarty,et al.  Multimodality and Deep Learning when Predicting Media Interestingness , 2017, MediaEval.

[39]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[40]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[41]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[42]  Mohammad Soleymani The Quest for Visual Interest , 2015, ACM Multimedia.

[43]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[44]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[46]  Gang Hua,et al.  Understanding and Predicting The Attractiveness of Human Action Shot , 2017, ArXiv.

[47]  Xiangyang Xue,et al.  Understanding and Predicting Interestingness of Videos , 2013, AAAI.

[48]  Josep Lluís de la Rosa i Esteva,et al.  Review of Methods to Predict Social Image Interestingness and Memorability , 2015, CAIP.