论文信息 - Detecting Violent Scenes in Movies by Auditory and Visual Cues

Detecting Violent Scenes in Movies by Auditory and Visual Cues

To detect violence in movies, we present a three-stage method integrating visual and auditory cues. In our method, those shots with potential violent content are first identified according to universal film-making rules. A modified semi-supervised learning technique based on semi-supervised cross feature learning (SCFL) is exploited, since it is capable to combine different types of features and use unlabeled data to improve the classification performance. Then, typical violence-related audio effects are further detected for the candidate shots, and we manage to transform the confidences outputted by the classifiers of various audio events into a shot-based violence score. Finally, the first two-stage probabilistic outputs are integrated in a boosting way to generate the final inference. The experimental results on four typical action movies preliminarily show the effectiveness of our method.

[1] Rong Yan,et al. Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2] Wen-Huang Cheng,et al. Semantic context detection based on hierarchical audio models , 2003, MIR '03.

[3] David Bordwell,et al. Film Art: An Introduction , 1979 .

[4] Lie Lu,et al. Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[5] Douglas Keislar,et al. Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[6] Jeho Nam,et al. Audio-visual content-based violent scene characterization , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7] Lie Lu,et al. Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8] Lie Lu,et al. A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9] Chih-Jen Lin,et al. Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[10] Alan F. Smeaton,et al. Automatically selecting shots for action movie trailers , 2006, MIR '06.

[11] Svetha Venkatesh,et al. Toward automatic extraction of expressive elements from motion pictures: tempo , 2002, IEEE Trans. Multim..

[12] Mubarak Shah,et al. Person-on-person violence detection in video data , 2002, Object recognition supported by user interaction for service robots.

[13] Chong-Wah Ngo,et al. Motion analysis and segmentation through spatio-temporal slices processing , 2003, IEEE Trans. Image Process..