Computational optimization for violent scenes detection

Violent scenes detection (VSD) can be considered as a specific problem of multimedia event detection. One popular approach to this problem is to employ multiple modals for presentation. By combining complementary modals, it has been shown remarkable improvement in accuracy. But, such an approach also requires high computational cost to process all features globally and locally extracted from static frames, video sequences, audio streams, or deep visual features. In this paper, we address the problem of modal selection (i.e. feature selection) when the computing resource (including both CPU and GPU) is limited. We evaluated possible combinations of features with different specifications of the computing resource. Evaluation results can be used to choose the optimal set of features for high accuracy regarding a pre-selected resource. We conducted experiments on the benchmark dataset MedialEval VSD 2014 (total of 60 hours).

[1]  Markus Schedl,et al.  FAR at MediaEval 2013 Violent Scenes Detection: Concept-based Violent Scenes Detection in Movies , 2013, MediaEval.

[2]  Mohammad Soleymani,et al.  VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation , 2014, Multimedia Tools and Applications.

[3]  C. Anderson,et al.  Violent video game exposure and aggression A literature review , 2004 .

[4]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[5]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[6]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[7]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jinhui Tang,et al.  Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks , 2014, MediaEval.

[9]  Dennis Koelma,et al.  The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection , 2016, ICMR.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Ziqiang Shi,et al.  Fudan at MediaEval 2013: Violent Scenes Detection Using Motion Features and Part-Level Attributes , 2013, MediaEval.

[12]  Yi Yang,et al.  A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Li-Yun Wang,et al.  Violence Detection in Movies , 2011, 2011 Eighth International Conference Computer Graphics, Imaging and Visualization.

[14]  Weiqiang Wang,et al.  Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training , 2009, PCM.

[15]  Markus Schedl,et al.  The MediaEval 2013 Affect Task: Violent Scenes Detection , 2013, MediaEval.

[16]  Sergios Theodoridis,et al.  A Multimodal Approach to Violence Detection in Video Sharing Sites , 2010, 2010 20th International Conference on Pattern Recognition.

[17]  Patrick Gros,et al.  Technicolor/INRIA Team at the MediaEval 2013 Violent Scenes Detection Task , 2013, MediaEval.

[18]  Vu Lam,et al.  NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task , 2012, MediaEval.

[19]  Vu Lam,et al.  Evaluation of multiple features for violent scenes detection , 2017, Multimedia Tools and Applications.

[20]  Jeho Nam,et al.  Audio-visual content-based violent scene characterization , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[21]  Georges Quénot,et al.  LIG at MediaEval 2013 Affect Task: Use of a Generic Method and Joint Audio-Visual Words , 2013, MediaEval.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Vu Lam,et al.  NII-UIT at MediaEval 2014 Violent Scenes Detection Affect Task , 2013, MediaEval.

[24]  Sergios Theodoridis,et al.  Audio-Visual Fusion for Detecting Violent Scenes in Videos , 2010, SETN.

[25]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Frank Hopfgartner,et al.  Detecting violent content in Hollywood movies by mid-level audio representations , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[27]  Mohammad Soleymani,et al.  The MediaEval 2011 Affect Task: Violent Scenes Detection in Hollywood movies , 2011, MediaEval.

[28]  Mohammad Soleymani,et al.  A Benchmarking Campaign for the Multimodal Detection of Violent Scenes in Movies , 2012, ECCV Workshops.

[29]  Chong-Wah Ngo,et al.  The Vireo Team at MediaEval 2013: Violent Scenes Detection by Mid-level Concepts Learnt from Youtube , 2013, MediaEval.

[30]  Anja Berger,et al.  Desensitization to media violence: links with habitual media violence exposure, aggressive cognitions, and aggressive behavior. , 2011, Journal of personality and social psychology.