论文信息 - Deep convolutional framework for abnormal behavior detection in a smart surveillance system

Deep convolutional framework for abnormal behavior detection in a smart surveillance system

Abstract The ability to instantly detect risky behavior in video surveillance systems is a critical issue in a smart surveillance system. In this paper, a unified framework based on a deep convolutional framework is proposed to detect abnormal human behavior from a standard RGB image. The objective of the unified structure is to improve detection speed while maintaining recognition accuracy. The deep convolutional framework consists of (1) a human subject detection and discrimination module that is proposed to solve the problem of separating object entities, in contrast to previous object detection algorithms, (2) a posture classification module to extract spatial features of abnormal behavior, and (3) an abnormal behavior detection module based on long short-term memory (LSTM). Experiments on a benchmark dataset evaluate the potential of the proposed method in the context of smart surveillance. The results indicate that the proposed method provides satisfactory performance in detecting abnormal behavior in a real-world scenario.

Kwang-Eun Ko | Kwee-Bo Sim | K. Ko | K. Sim

[1] Shiuh-Ku Weng,et al. Video object tracking using adaptive Kalman filter , 2006, J. Vis. Commun. Image Represent..

[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] K. Schwab. The Fourth Industrial Revolution , 2013 .

[4] Gregory D. Abowd,et al. Towards a Better Understanding of Context and Context-Awareness , 1999, HUC.

[5] K. B. Sim,et al. Real-time object entity detection system for smart surveillance application , 2017 .

[6] Jernej Barbic,et al. Segmenting Motion Capture Data into Distinct Behaviors , 2004, Graphics Interface.

[7] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Özgür Ulusoy,et al. Keyframe labeling technique for surveillance event classification , 2010 .

[9] Nassir Navab,et al. Human skeleton tracking from depth data using geodesic distances and optical flow , 2012, Image Vis. Comput..

[10] Mario Fernando Montenegro Campos,et al. STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[11] Meng Li,et al. Graph-based approach for 3D human skeletal action recognition , 2017, Pattern Recognit. Lett..

[12] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[13] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Sungyoung Lee,et al. Interactive activity recognition using pose-based spatio-temporal relation features and four-level Pachinko Allocation Model , 2016, Inf. Sci..

[16] Geir Evensen,et al. The Ensemble Kalman Filter: theoretical formulation and practical implementation , 2003 .

[17] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[18] Özgür Ulusoy,et al. Scenario-based query processing for video-surveillance archives , 2010, Eng. Appl. Artif. Intell..

[19] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Andrea Kleinsmith,et al. Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[21] Guijin Wang,et al. A novel hierarchical framework for human action recognition , 2016, Pattern Recognit..

[22] Adrian Hilton,et al. A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[23] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[24] Anupam Agrawal,et al. A survey on activity recognition and behavior understanding in video surveillance , 2012, The Visual Computer.

[25] Alois Knoll,et al. Multimodal Human Activity Recognition for Industrial Manufacturing Processes in Robotic Workcells , 2015, ICMI.

[26] A. Enis Çetin,et al. Silhouette-Based Method for Object Classification and Human Action Recognition in Video , 2006, ECCV Workshop on HCI.

[27] Yun Fu,et al. Close Human Interaction Recognition Using Patch-Aware Models. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[28] Jürgen Schmidhuber,et al. Learning to forget: continual prediction with LSTM , 1999 .

[29] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[31] Sepp Hochreiter,et al. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[32] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33] Anuj Srivastava,et al. Accurate 3D action recognition using learning on the Grassmann manifold , 2015, Pattern Recognit..

[34] Yanning Zhang,et al. Going deeper with two-stream ConvNets for action recognition in video surveillance , 2017, Pattern Recognit. Lett..

[35] R. Vidal,et al. Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Ronald Poppe,et al. A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[37] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[39] Albrecht Schmidt,et al. Implicit human computer interaction through context , 2000, Personal Technologies.

[40] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.