A Multimodal Approach to Violence Detection in Video Sharing Sites

This paper presents a method for detecting violent content in video sharing sites. The proposed approach operates on a fusion of three modalities: audio, moving image and text data, the latter being collected from the accompanying user comments. The problem is treated as a binary classification task (violent vs non-violent content) on a 9-dimensional feature space, where 7 out of 9 features are extracted from the audio stream. The proposed method has been evaluated on 210 YouTube videos and the overall accuracy has reached 82%.

[1]  Sergios Theodoridis,et al.  A Multi-Class Audio Classification Method With Respect To Violent Content In Movies Using Bayesian Networks , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[2]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[3]  Sergios Theodoridis,et al.  A Speech/Music Discriminator of Radio Recordings Based on Dynamic Programming and Bayesian Networks , 2008, IEEE Transactions on Multimedia.

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Dan Schonfeld,et al.  Statistical sequential analysis for real-time video scene change detection on compressed multimedia bitstream , 2003, IEEE Trans. Multim..

[6]  Nuno Vasconcelos,et al.  Towards semantically meaningful feature spaces for the characterization of video content , 1997, Proceedings of International Conference on Image Processing.

[7]  Mubarak Shah,et al.  Movie genre classification by exploiting audio-visual features of previews , 2002, Object recognition supported by user interaction for service robots.

[8]  Mubarak Shah,et al.  Person-on-person violence detection in video data , 2002, Object recognition supported by user interaction for service robots.

[9]  Jeho Nam,et al.  Event-Driven Video Abstraction and Visualization , 2004, Multimedia Tools and Applications.

[10]  Frank M. Ochberg,et al.  Understanding and Preventing Violence , 1995 .