Detecting Violent Scenes in Movies by Auditory and Visual Cues

To detect violence in movies, we present a three-stage method integrating visual and auditory cues. In our method, those shots with potential violent content are first identified according to universal film-making rules. A modified semi-supervised learning technique based on semi-supervised cross feature learning (SCFL) is exploited, since it is capable to combine different types of features and use unlabeled data to improve the classification performance. Then, typical violence-related audio effects are further detected for the candidate shots, and we manage to transform the confidences outputted by the classifiers of various audio events into a shot-based violence score. Finally, the first two-stage probabilistic outputs are integrated in a boosting way to generate the final inference. The experimental results on four typical action movies preliminarily show the effectiveness of our method.

[1]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Wen-Huang Cheng,et al.  Semantic context detection based on hierarchical audio models , 2003, MIR '03.

[3]  David Bordwell,et al.  Film Art: An Introduction , 1979 .

[4]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[5]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[6]  Jeho Nam,et al.  Audio-visual content-based violent scene characterization , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[10]  Alan F. Smeaton,et al.  Automatically selecting shots for action movie trailers , 2006, MIR '06.

[11]  Svetha Venkatesh,et al.  Toward automatic extraction of expressive elements from motion pictures: tempo , 2002, IEEE Trans. Multim..

[12]  Mubarak Shah,et al.  Person-on-person violence detection in video data , 2002, Object recognition supported by user interaction for service robots.

[13]  Chong-Wah Ngo,et al.  Motion analysis and segmentation through spatio-temporal slices processing , 2003, IEEE Trans. Image Process..