A Benchmarking Campaign for the Multimodal Detection of Violent Scenes in Movies

We present an international benchmark on the detection of violent scenes in movies, implemented as a part of the multimedia benchmarking initiative MediaEval 2011. The task consists in detecting portions of movies where physical violence is present from the automatic analysis of the video, sound and subtitle tracks. A dataset of 15 Hollywood movies was carefully annotated and divided into a development set and a test set containing 3 movies. Annotation strategies and resolution of borderline cases are discussed at length in the paper. Results from 29 runs submitted by the 6 participating sites are analyzed. The first year's results are promising, but considering the use case, there is still a large room for improvement. The detailed analysis of the 2011 benchmark brings valuable insight for the implementation of future evaluation on violent scenes detection in movies.

[1]  J. de Pina-Cabral,et al.  World , 2004, Science.

[2]  Sergios Theodoridis,et al.  Violence Content Classification Using Audio Features , 2006, SETN.

[3]  Changsheng Xu,et al.  Advances in Multimedia Information Processing - PCM 2008, 9th Pacific Rim Conference on Multimedia, Tainan, Taiwan, December 9-13, 2008. Proceedings , 2008, PCM.

[4]  Sergios Theodoridis,et al.  Gunshot detection in audio streams from movies by means of dynamic programming and Bayesian networks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Wen Gao,et al.  Detecting Violent Scenes in Movies by Auditory and Visual Cues , 2008, PCM.

[6]  George A. Vouros,et al.  Artificial Intelligence: Theories, Models and Applications, 5th Hellenic Conference on AI, SETN 2008, Syros, Greece, October 2-4, 2008. Proceedings , 2008, SETN.

[7]  Liang-Hua Chen,et al.  Action Scene Detection with Support Vector Machines , 2009, J. Multim..

[8]  Sergios Theodoridis,et al.  Audio-Visual Fusion for Detecting Violent Scenes in Videos , 2010, SETN.

[9]  Rahul Sukthankar,et al.  Violence Detection in Video Using Computer Vision Techniques , 2011, CAIP.

[10]  Hervé Glotin,et al.  Real-time entropic unsupervised violent scenes detection in Hollywood movies - DYNI @ MediaEval Affect Task 2011 , 2011, MediaEval.

[11]  Li-Yun Wang,et al.  Violence Detection in Movies , 2011, 2011 Eighth International Conference Computer Graphics, Imaging and Visualization.

[12]  Mohammad Soleymani,et al.  Automatic Violence Scenes Detection: A multi-modal approach , 2011, MediaEval.

[13]  Sahin Albayrak,et al.  MediaEval 2011 Affect Task: Violent Scene Detection combining audio and visual Features with SVM , 2011, MediaEval.

[14]  S. Satoh,et al.  NII, Japan at MediaEval 2011 Violent Scenes Detection Task , 2011, MediaEval.

[15]  Patrick Gros,et al.  Technicolor and INRIA/IRISA at MediaEval 2011: learning temporal modality integration with Bayesian Networks , 2011, MediaEval.

[16]  Georges Quénot,et al.  LIG at MediaEval 2012 affect task: use of a generic method , 2011, MediaEval.