Semi-supervised dictionary learning via local sparse constraints for violence detection

Abstract In this paper, we propose a novel semi-supervised learning framework for violence detection in video surveillance. With this framework, a classifier which distinguishes violent behavior from normal behavior can be trained using inexpensive unlabeled data with the assistance of human operators. Our approach can learn a single dictionary and a predictive linear classifier jointly. Specifically, we integrate the reconstruction error of labeled and unlabeled data, representation constraints and the coefficient incoherence into an objective function for dictionary learning, which enhances the representative and discriminative power of the established dictionary. This has contributed to that the dictionary and the classifier learned from the labeled set yield very small generalization error on unseen data. Experimental results on benchmark datasets have demonstrated the effectiveness of our approach in violence detection.

[1]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[2]  Markus Schedl,et al.  A naive mid-level concept-based fusion approach to violence detection in Hollywood movies , 2013, ICMR '13.

[3]  Jeho Nam,et al.  Audio-visual content-based violent scene characterization , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[4]  Mahdieh Soleymani Baghshah,et al.  PSSDL: Probabilistic Semi-supervised Dictionary Learning , 2013, ECML/PKDD.

[5]  Kejun Wang,et al.  Video-Based Abnormal Human Behavior Recognition—A Review , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Jian Wang,et al.  Generalized Orthogonal Matching Pursuit , 2011, IEEE Transactions on Signal Processing.

[7]  Prospero C. Naval,et al.  DOVE : Detection of Movie Violence using Motion Intensity Analysis on Skin and Blood , 2006 .

[8]  Qiang Wu,et al.  Violent video detection based on MoSIFT feature and sparse coding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Patrick Gros,et al.  Multimodal information fusion and temporal integration for violence detection in movies , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Allen Y. Yang,et al.  Fast L1-Minimization Algorithms For Robust Face Recognition , 2010, 1007.3753.

[11]  Yu Xue,et al.  A self-adaptive artificial bee colony algorithm based on global best for global optimization , 2017, Soft Computing.

[12]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[13]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  David Zhang,et al.  Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification , 2014, International Journal of Computer Vision.

[15]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[16]  Jianping Fan,et al.  Learning inter-related visual dictionary for object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Peng Dai,et al.  Group Interaction Analysis in Dynamic Context$^{\ast}$ , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Guillermo Sapiro,et al.  Sparse Modeling of Human Actions from Motion Imagery , 2012, International Journal of Computer Vision.

[20]  Changyin Sun,et al.  Supervised class-specific dictionary learning for sparse modeling in action recognition , 2012, Pattern Recognit..

[21]  Donghui Wang,et al.  A Dictionary Learning Approach for Classification: Separating the Particularity and the Commonality , 2012, ECCV.

[22]  Qingming Huang,et al.  Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Tal Hassner,et al.  Violent flows: Real-time detection of violent crowd behavior , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Xiangjian He,et al.  A new method for violence detection in surveillance scenes , 2015, Multimedia Tools and Applications.

[25]  Wen-Huang Cheng,et al.  Semantic context detection based on hierarchical audio models , 2003, MIR '03.

[26]  Weiqiang Wang,et al.  Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training , 2009, PCM.

[27]  Robert B. Fisher,et al.  Modelling Crowd Scenes for Event Detection , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[28]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Xiangjian He,et al.  Discriminative Dictionary Learning With Motion Weber Local Descriptor for Violence Detection , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Xiaoqin Zhang,et al.  Semi-Supervised Dictionary Learning via Structural Sparse Preserving , 2016, AAAI.

[33]  Samy Bengio,et al.  Group Sparse Coding , 2009, NIPS.

[34]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[35]  Manuele Bicego,et al.  Audio-Visual Event Recognition in Surveillance Video Sequences , 2007, IEEE Transactions on Multimedia.

[36]  Xiangjian He,et al.  MoWLD: a robust motion image descriptor for violence detection , 2015, Multimedia Tools and Applications.

[37]  Nicu Sebe,et al.  Learning Deep Representations of Appearance and Motion for Anomalous Event Detection , 2015, BMVC.

[38]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Rahul Sukthankar,et al.  Violence Detection in Video Using Computer Vision Techniques , 2011, CAIP.

[40]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[42]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[43]  Matti Pietikäinen,et al.  IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, TPAMI-2008-09-0620 1 WLD: A Robust Local Image Descriptor , 2022 .

[44]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Mubarak Shah,et al.  Person-on-person violence detection in video data , 2002, Object recognition supported by user interaction for service robots.