A Violence Detection Approach Based on Spatio-temporal Hypergraph Transition

In the field of activity recognition, violence detection is one of the most challenging tasks due to the variety of action patterns and the lack of training data. In the last decade, the performance is getting improved by applying local spatio-temporal features. However, geometric relationships and transition processes of these features have not been fully utilized. In this paper, we propose a novel framework based on spatio-temporal hypergraph transition. First, we utilize hypergraphs to represent the geometric relationships among spatia-temporal features in a single frame. Then, we apply a new descriptor called Histogram of Velocity Change (HVC), which characterizes motion changing intensity, to model hypergraph transitions among consecutive frames. Finally, we adopt Hidden Markov Models (HMMs) with the hypergraphs and the descriptors to detect and localize violence in video frames. Experiment results on BEHAVE dataset and UT-Interaction dataset show that the proposed framework outperforms the existing methods.

[1]  Mohammed Bennamoun,et al.  Human Interaction Prediction Using Deep Temporal Features , 2016, ECCV Workshops.

[2]  Arnaldo de Albuquerque Araújo,et al.  Violence Detection in Video Using Spatio-Temporal Features , 2010, 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images.

[3]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[4]  Qingshan Liu,et al.  Abnormal detection using interaction energy potentials , 2011, CVPR 2011.

[5]  Martial Hebert,et al.  Fast and Scalable Approximate Spectral Matching for Higher Order Graph Matching , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Alessandro Perina,et al.  Detecting Abnormal Behavioral Patterns in Crowd Scenarios , 2016, Toward Robotic Socially Believable Behaving Systems.

[7]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[8]  Yang Yi,et al.  Human action recognition with graph-based multiple-instance learning , 2016, Pattern Recognit..

[9]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Bowen Zhang,et al.  Real-Time Action Recognition with Enhanced Motion Vector CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jun Miao,et al.  Activity Auto-Completion: Predicting Human Activities from Partial Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[14]  Jean Ponce,et al.  A tensor-based algorithm for high-order graph matching , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jeho Nam,et al.  Audio-visual content-based violent scene characterization , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[16]  William Brendel,et al.  Learning spatiotemporal graphs of human activities , 2011, 2011 International Conference on Computer Vision.

[17]  Silvio Savarese,et al.  A Hierarchical Representation for Future Action Prediction , 2014, ECCV.

[18]  Ramin Mehran,et al.  Abnormal crowd behavior detection using social force model , 2009, CVPR.

[19]  Tal Hassner,et al.  Violent flows: Real-time detection of violent crowd behavior , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[20]  Chokri Ben Amar,et al.  Graph-based approach for human action recognition using spatio-temporal features , 2014, J. Vis. Commun. Image Represent..

[21]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[23]  Yun Fu,et al.  Close Human Interaction Recognition Using Patch-Aware Models. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[24]  Chunfeng Yuan,et al.  Human Action Recognition Based on Context-Dependent Graph Kernels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Lakhmi C. Jain,et al.  Toward Robotic Socially Believable Behaving Systems - Volume II - Modeling Social Signals , 2016, Intelligent Systems Reference Library.

[26]  Christian Wolf,et al.  Recognizing and Localizing Individual Activities through Graph Matching , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[27]  Robert B. Fisher,et al.  The BEHAVE video dataset: ground truthed video for multi-person behavior classification , 2010 .