Violence Detection in Video by Using 3D Convolutional Neural Networks

Whereas most researches are about the action recognition problem, the detection of fights has been comparatively less involved. Such capability may be of great importance. Typical methods mostly rely on domain knowledge to construct complex handcraft features from inputs. On the contrary, deep models can act directly on the raw inputs and automatically extracts features. So we developed in this paper a novel 3D ConvNets model for violence detection in video without using any prior knowledge. To evaluate our method, experimental validation conducted in the context of the Hockey dataset. The results show that the method achieves superior performance without relying on handcrafted features.

[1]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[2]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[3]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[4]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[5]  Terumasa Aoki,et al.  Violent Scenes Detection Using Mid-Level Violence Clustering , 2014 .

[6]  D. B. Davis,et al.  Intel Corp. , 1993 .

[7]  Rahul Sukthankar,et al.  Violence Detection in Video Using Computer Vision Techniques , 2011, CAIP.

[8]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[9]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[10]  Markus Schedl,et al.  A naive mid-level concept-based fusion approach to violence detection in Hollywood movies , 2013, ICMR '13.

[11]  Jeho Nam,et al.  Audio-visual content-based violent scene characterization , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[12]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[13]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[15]  C. Patvardhan,et al.  Handwritten Devnagari Numerals Recognition with Higher Accuracy , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).