Detecting Violence in Video using Subclasses

This paper attacks the challenging problem of violence detection in videos. Different from existing works focusing on combining multi-modal features, we go one step further by adding and exploiting subclasses visually related to violence. We enrich the MediaEval 2015 violence dataset by manually labeling violence videos with respect to the subclasses. Such fine-grained annotations not only help understand what have impeded previous efforts on learning to fuse the multi-modal features, but also enhance the generalization ability of the learned fusion to novel test data. The new subclass based solution, with AP of 0.303 and P100 of 0.55 on the MediaEval 2015 test set, outperforms the state-of-the-art. Notice that our solution does not require fine-grained annotations on the test set, so it can be directly applied on novel and fully unlabeled videos. Interestingly, our study shows that motion related features (MBH, HOG and HOF), though being essential part in previous systems, are seemingly dispensable. Data is available at http://lixirong.net/datasets/mm2016vsd

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Chong-Wah Ngo,et al.  The Vireo Team at MediaEval 2013: Violent Scenes Detection by Mid-level Concepts Learnt from Youtube , 2013, MediaEval.

[3]  Vu Lam,et al.  Evaluation of multiple features for violent scenes detection , 2017, Multimedia Tools and Applications.

[4]  Xirong Li,et al.  RUCMM at MediaEval 2015 Affective Impact of Movies Task: Fusion of Audio and Visual Cues , 2015, MediaEval.

[5]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[6]  Emmanuel Dellandréa,et al.  The MediaEval 2015 Affective Impact of Movies Task , 2015, MediaEval.

[7]  Xi Wang,et al.  Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning , 2015, MediaEval.

[8]  Xiaoyong Du,et al.  Tag Features for Geo-Aware Image Classification , 2015, IEEE Transactions on Multimedia.

[9]  Yi Yang,et al.  A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Bowen Zhang,et al.  MIC-TJU in MediaEval 2015 Affective Impact of Movies Task , 2015, MediaEval.

[11]  Marcel Worring,et al.  Bootstrapping Visual Categorization With Relevant Negatives , 2013, IEEE Transactions on Multimedia.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Vu Lam,et al.  NII-UIT at MediaEval 2015 Affective Impact of Movies Task , 2015, MediaEval.

[14]  Rahul Sukthankar,et al.  Violence Detection in Video Using Computer Vision Techniques , 2011, CAIP.

[15]  Markus Schedl,et al.  Benchmarking Violent Scenes Detection in movies , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[16]  Dennis Koelma,et al.  Qualcomm Research and University of Amsterdam at TRECVID 2015: Recognizing Concepts, Objects, and Events in Video , 2015, TRECVID.

[17]  Markus Schedl,et al.  The MediaEval 2013 Affect Task: Violent Scenes Detection , 2013, MediaEval.

[18]  Mohammad Soleymani,et al.  VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation , 2014, Multimedia Tools and Applications.

[19]  Qiang Wu,et al.  Violent video detection based on MoSIFT feature and sparse coding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[21]  Dennis Koelma,et al.  The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection , 2016, ICMR.

[22]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[24]  Dong Liu,et al.  EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.