The Violent Scenes Detection task aims at evaluating algorithms that automatically localize violent segments in both Hollywood movies and short web videos. The definition of violence is subjective: “the segments that one would not let an 8 years old child see in a movie because they contain physical violence”. This is a highly challenging problem because of the strong content variations among the positive instances. In this year’s evaluation, we adopted our recently proposed classification method to fuse multiple features using Deep Neural Networks (DNN). The method was named regularized DNN. We extracted a set of visual and audio features, which have been observed useful. We then applied the regularized DNN for feature fusion and classification. Results indicate that using multiple features is still very helpful, and more importantly, our proposed regularized DNN offers significantly better results than the popular SVM. We achieved a mean average precision of 0.63 for the main task and 0.60 for the generalization task. 1. SYSTEM DESCRIPTION Figure 1 gives an overview of our system. In this short paper, we briefly describe each of the key components. For the task definition, data and evaluation metric, interested readers may refer to [1].
[1]
Cordelia Schmid,et al.
Action Recognition with Improved Trajectories
,
2013,
2013 IEEE International Conference on Computer Vision.
[2]
Jun Wang,et al.
Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification
,
2014,
ACM Multimedia.
[3]
Ivan Laptev,et al.
On Space-Time Interest Points
,
2003,
Proceedings Ninth IEEE International Conference on Computer Vision.
[4]
Markus Schedl,et al.
The MediaEval 2013 Affect Task: Violent Scenes Detection
,
2013,
MediaEval.
[5]
Chong-Wah Ngo,et al.
Trajectory-Based Modeling of Human Actions with Motion Reference Points
,
2012,
ECCV.
[6]
Sam T. Roweis,et al.
EM Algorithms for PCA and SPCA
,
1997,
NIPS.
[7]
Xiangyang Xue,et al.
Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation
,
2014,
2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).