Evaluation of low-level features for detecting violent scenes in videos

Automatically detecting violent scenes in videos not only has great potential in several applications (such as movie selection or recommendation for children) but also is a very hot academic research topic. Since 2011, violent scene detection task is one of the core tasks of MediaEval, a benchmarking initiative dedicated to evaluating new algorithms for multimedia access and retrieval1. In this paper, we evaluate the performance of low-level audio/visual features for the violent scene detection task using the datasets and evaluation protocol provided by the MediaEval organizers. Our result report can be used as a baseline for comparison of new algorithms in this task.

[1]  Patrick Gros,et al.  Technicolor and INRIA/IRISA at MediaEval 2011: learning temporal modality integration with Bayesian Networks , 2011, MediaEval.

[2]  Weiqiang Wang,et al.  Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training , 2009, PCM.

[3]  S. Satoh,et al.  NII, Japan at MediaEval 2011 Violent Scenes Detection Task , 2011, MediaEval.

[4]  Sahin Albayrak,et al.  MediaEval 2011 Affect Task: Violent Scene Detection combining audio and visual Features with SVM , 2011, MediaEval.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Prospero C. Naval,et al.  DOVE : Detection of Movie Violence using Motion Intensity Analysis on Skin and Blood , 2006 .

[7]  Li-Yun Wang,et al.  Violence Detection in Movies , 2011, 2011 Eighth International Conference Computer Graphics, Imaging and Visualization.

[8]  Wen Gao,et al.  Detecting Violent Scenes in Movies by Auditory and Visual Cues , 2008, PCM.

[9]  Mohammad Soleymani,et al.  A Benchmarking Campaign for the Multimodal Detection of Violent Scenes in Movies , 2012, ECCV Workshops.

[10]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Markus Schedl,et al.  The MediaEval 2013 Affect Task: Violent Scenes Detection , 2013, MediaEval.

[12]  Wen-Huang Cheng,et al.  Semantic context detection based on hierarchical audio models , 2003, MIR '03.

[13]  Jeho Nam,et al.  Audio-visual content-based violent scene characterization , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[14]  Sergios Theodoridis,et al.  Audio-Visual Fusion for Detecting Violent Scenes in Videos , 2010, SETN.

[15]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[16]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Hervé Glotin,et al.  Real-time entropic unsupervised violent scenes detection in Hollywood movies - DYNI @ MediaEval Affect Task 2011 , 2011, MediaEval.

[18]  Sergios Theodoridis,et al.  A Multimodal Approach to Violence Detection in Video Sharing Sites , 2010, 2010 20th International Conference on Pattern Recognition.

[19]  Patrick Gros,et al.  Multimodal information fusion and temporal integration for violence detection in movies , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Sergios Theodoridis,et al.  Violence Content Classification Using Audio Features , 2006, SETN.

[21]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.