Intelligent video surveillance: a review through deep learning techniques for crowd analysis

Big data applications are consuming most of the space in industry and research area. Among the widespread examples of big data, the role of video streams from CCTV cameras is equally important as other sources like social media data, sensor data, agriculture data, medical data and data evolved from space research. Surveillance videos have a major contribution in unstructured big data. CCTV cameras are implemented in all places where security having much importance. Manual surveillance seems tedious and time consuming. Security can be defined in different terms in different contexts like theft identification, violence detection, chances of explosion etc. In crowded public places the term security covers almost all type of abnormal events. Among them violence detection is difficult to handle since it involves group activity. The anomalous or abnormal activity analysis in a crowd video scene is very difficult due to several real world constraints. The paper includes a deep rooted survey which starts from object recognition, action recognition, crowd analysis and finally violence detection in a crowd environment. Majority of the papers reviewed in this survey are based on deep learning technique. Various deep learning methods are compared in terms of their algorithms and models. The main focus of this survey is application of deep learning techniques in detecting the exact count, involved persons and the happened activity in a large crowd at all climate conditions. Paper discusses the underlying deep learning implementation technology involved in various crowd video analysis methods. Real time processing, an important issue which is yet to be explored more in this field is also considered. Not many methods are there in handling all these issues simultaneously. The issues recognized in existing methods are identified and summarized. Also future direction is given to reduce the obstacles identified. The survey provides a bibliographic summary of papers from ScienceDirect, IEEE Xplore and ACM digital library.

[1]  Yong Yu,et al.  Product-Based Neural Networks for User Response Prediction over Multi-Field Categorical Data , 2018, ACM Trans. Inf. Syst..

[2]  Mei-Ling Shyu,et al.  A Survey on Deep Learning , 2018, ACM Comput. Surv..

[3]  Dongyu Zhang,et al.  Image-to-Video Person Re-Identification With Temporally Memorized Similarity Learning , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Weng-Kin Lai,et al.  ArchCam: Real time expert system for suspicious behaviour detection in ATM site , 2018, Expert Syst. Appl..

[5]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[6]  Jian Zhang,et al.  Jointly learning perceptually heterogeneous features for blind 3D video quality assessment , 2019, Neurocomputing.

[7]  Claudio Feliciani,et al.  Measurement of congestion and intrinsic risk in pedestrian crowds , 2018, Transportation Research Part C: Emerging Technologies.

[8]  Daniel Cohen-Or,et al.  ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning , 2018, ACM Trans. Graph..

[9]  Gerhard Rigoll,et al.  A deep convolutional neural network for video sequence background subtraction , 2018, Pattern Recognit..

[10]  Wei Shen,et al.  Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes , 2016, Signal Process. Image Commun..

[11]  Shaogang Gong,et al.  Group and Crowd Behavior for Computer Vision , 2017 .

[12]  Heitor Silvério Lopes,et al.  A study of deep convolutional auto-encoders for anomaly detection in videos , 2018, Pattern Recognit. Lett..

[13]  Junping Du,et al.  Boosting deep attribute learning via support vector regression for fast moving crowd counting , 2017, Pattern Recognit. Lett..

[14]  Yi Pan,et al.  Reconstruction of Hidden Representation for Robust Feature Extraction , 2017, ACM Trans. Intell. Syst. Technol..

[15]  Emmanuel Agu,et al.  Fact or Fiction , 2018, Proc. ACM Hum. Comput. Interact..

[16]  Patrick J. Flynn,et al.  Crowd Scene Understanding from Video , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[17]  Xiaogang Wang,et al.  Deep Learning for Scene-Independent Crowd Analysis , 2017, Group and Crowd Behavior for Computer Vision.

[18]  Ivan Laptev,et al.  The Analysis of High Density Crowds in Videos , 2017, Group and Crowd Behavior for Computer Vision.

[19]  Xiaoqiang Lu,et al.  Learning deep event models for crowd anomaly detection , 2017, Neurocomputing.

[20]  Shenghua Gao,et al.  Deep Surface Light Fields , 2018, PACMCGIT.

[21]  Thouraya Bouabana-Tebibel,et al.  Toward a big data approach for indexing encrypted data in Cloud Computing , 2019, Secur. Priv..

[22]  Zhezhou Yu,et al.  Deep learning to frame objects for visual target tracking , 2017, Eng. Appl. Artif. Intell..

[23]  Ahmed B. Altamimi,et al.  Anomalous entities detection and localization in pedestrian flows , 2018, Neurocomputing.

[24]  Hichem Snoussi,et al.  Abnormal event detection based on analysis of movement information of video sequence , 2018 .

[25]  Juan A. Sigüenza,et al.  Intelligent video surveillance beyond robust background modeling , 2018, Expert Syst. Appl..

[26]  Catherine D. Schuman,et al.  A study of complex deep learning networks on high performance, neuromorphic, and quantum computers , 2016, HiPC 2016.

[27]  Shuaiwen Song,et al.  NUMA-Caffe , 2018, ACM Trans. Archit. Code Optim..

[28]  Mohammed Bennamoun,et al.  Deep Reconstruction Models for Image Set Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Michael G. Strintzis,et al.  Multiple Hierarchical Dirichlet Processes for anomaly detection in traffic , 2018, Comput. Vis. Image Underst..

[30]  K Edet Bijoy,et al.  SIFT and Tensor Based Object Detection and Classification in Videos Using Deep Neural Networks , 2016 .

[31]  Seungmin Rho,et al.  Natural Language Description of Video Streams Using Task-Specific Feature Encoding , 2018, IEEE Access.

[32]  Weria Khaksar,et al.  Facial Expression Recognition Using Salient Features and Convolutional Neural Network , 2017, IEEE Access.

[33]  Muhammad Moazam Fraz,et al.  Person Re-Identification Using Hybrid Representation Reinforced by Metric Learning , 2018, IEEE Access.

[34]  Fang Hao,et al.  D-STC: Deep learning with spatio-temporal constraints for train drivers detection from videos , 2019, Pattern Recognit. Lett..

[35]  Deng Cai,et al.  Sparse Coding Guided Spatiotemporal Feature Learning for Abnormal Event Detection in Large Videos , 2019, IEEE Transactions on Multimedia.

[36]  Nihan Kesim Cicekli,et al.  SVAS: Surveillance Video Analysis System , 2017, Expert Syst. Appl..

[37]  Richa Singh,et al.  Face Verification via Learned Representation on Feature-Rich Video Frames , 2017, IEEE Transactions on Information Forensics and Security.

[38]  Xuan Song,et al.  Online Deep Ensemble Learning for Predicting Citywide Human Mobility , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[39]  Xiaofei Wang,et al.  A classification method based on streak flow for abnormal crowd behaviors , 2016 .

[40]  Guang Chen,et al.  A novel deep multi-channel residual networks-based metric learning method for moving human localization in video surveillance , 2018, Signal Process..

[41]  Faouzi Alaya Cheikh,et al.  Neural networks based visual attention model for surveillance videos , 2015, Neurocomputing.

[42]  Yaron Lipman,et al.  Multi-chart generative surface modeling , 2018, ACM Trans. Graph..

[43]  Yunde Jia,et al.  Deep CNN based binary hash video representations for face retrieval , 2018, Pattern Recognit..

[44]  Asghar Feizi,et al.  High-Level Feature Extraction for Classification and Person Re-Identification , 2017, IEEE Sensors Journal.

[45]  Sung Wook Baik,et al.  Convolutional Neural Networks Based Fire Detection in Surveillance Videos , 2018, IEEE Access.

[46]  Christoph Meinel,et al.  Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[47]  Peter Hedman,et al.  Instant 3D photography , 2018, ACM Trans. Graph..

[48]  Ming Zhu,et al.  Background Subtraction Using Multiscale Fully Convolutional Network , 2018, IEEE Access.

[49]  Xiaogang Wang,et al.  Crowded Scene Understanding by Deeply Learned Volumetric Slices , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Qi Wang,et al.  Action recognition using spatial-optical data organization and sequential learning framework , 2018, Neurocomputing.

[51]  Arun Kumar Sangaiah,et al.  Fog computing enabled cost-effective distributed summarization of surveillance videos for smart cities , 2019, J. Parallel Distributed Comput..

[52]  Louis Tay,et al.  Video capture of human behaviors: toward a Big Data approach , 2017, Current Opinion in Behavioral Sciences.

[53]  Jenq-Neng Hwang,et al.  An Ensemble of Invariant Features for Person Reidentification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[54]  John Flynn,et al.  Stereo magnification , 2018, ACM Trans. Graph..

[55]  Damon L. Woodard,et al.  Deep Learning for Biometrics , 2018, ACM Comput. Surv..

[56]  Venkatesh Saligrama,et al.  Activity Retrieval in Large Surveillance Videos , 2014 .

[57]  C. Krishna Mohan,et al.  Snatch theft detection in unconstrained surveillance videos using action attribute modelling , 2018, Pattern Recognit. Lett..

[58]  Qi Wang,et al.  Deep Metric Learning for Crowdedness Regression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[59]  W. Gunawan,et al.  A review on classifying abnormal behavior in crowd scene , 2019, J. Vis. Commun. Image Represent..

[60]  Dongyu Liu,et al.  DeepTracker: Visualizing the Training Process of Convolutional Neural Networks , 2018, ACM Trans. Intell. Syst. Technol..

[61]  Arun Kumar Sangaiah,et al.  An intelligent decision computing paradigm for crowd monitoring in the smart city , 2017, J. Parallel Distributed Comput..

[62]  Siddharth Swarup Rautaray,et al.  Application of Deep Learning for Object Detection , 2018 .

[63]  Jenq-Neng Hwang,et al.  Integrated video object tracking with applications in trajectory-based event detection , 2011, J. Vis. Commun. Image Represent..

[64]  Xiaojun Wan,et al.  QuoteRec: Toward Quote Recommendation for Writing , 2018, ACM Trans. Inf. Syst..

[65]  Weiru Liu,et al.  Evidential event inference in transport video surveillance , 2016, Comput. Vis. Image Underst..

[66]  Dapeng Tao,et al.  Deep Multi-View Feature Learning for Person Re-Identification , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[67]  Xuanzhe Liu,et al.  DeepType , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[68]  Shen Li,et al.  RDeepSense: Reliable Deep Mobile Computing Models with Uncertainty Estimations , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[69]  Teresa Pamula,et al.  Road Traffic Conditions Classification Based on Multilevel Filtering of Image Content Using Convolutional Neural Networks , 2018, IEEE Intelligent Transportation Systems Magazine.

[70]  Juan C. Gutiérrez-Cáceres,et al.  Fast Face Detection in Violent Video Scenes , 2016, CLEI Selected Papers.

[71]  Kwang-Eun Ko,et al.  Deep convolutional framework for abnormal behavior detection in a smart surveillance system , 2018, Eng. Appl. Artif. Intell..

[72]  Luca Iocchi,et al.  Online real-time crowd behavior detection in video sequences , 2016, Comput. Vis. Image Underst..

[73]  Hao He,et al.  RF-Based Fall Monitoring Using Convolutional Neural Networks , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[74]  Leonidas J. Guibas,et al.  Deep part induction from articulated object pairs , 2018, ACM Trans. Graph..

[75]  Yizhou Yu,et al.  Image super-resolution via deterministic-stochastic synthesis and local statistical rectification , 2018, ACM Trans. Graph..

[76]  Nicu Sebe,et al.  Detecting anomalous events in videos by learning deep representations of appearance and motion , 2017, Comput. Vis. Image Underst..

[77]  Deng Cai,et al.  Tracking people in RGBD videos using deep learning and motion clues , 2016, Neurocomputing.

[78]  Dong Wang,et al.  Dairy goat detection based on Faster R-CNN from surveillance video , 2018, Comput. Electron. Agric..

[79]  Hao Li,et al.  3D hair synthesis using volumetric variational autoencoders , 2018, ACM Trans. Graph..

[80]  Romaric Audigier,et al.  RIMOC, a feature to discriminate unstructured motions: Application to violence detection for video-surveillance , 2016, Comput. Vis. Image Underst..

[81]  Hichem Snoussi,et al.  Video feature descriptor combining motion and appearance cues with length-invariant characteristics , 2018 .

[82]  Bo Li,et al.  Intelligent video surveillance for real-time detection of suicide attempts , 2018, Pattern Recognit. Lett..

[83]  Chong-Min Kyung,et al.  Rejecting Motion Outliers for Efficient Crowd Anomaly Detection , 2019, IEEE Transactions on Information Forensics and Security.

[84]  Neil Martin Robertson,et al.  Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.

[85]  Sarita Chaudhary,et al.  Multiple Anomalous Activity Detection in Videos , 2018 .

[86]  Rynson W. H. Lau,et al.  What characterizes personalities of graphic designs? , 2018, ACM Trans. Graph..

[87]  Vanessa Testoni,et al.  Video pornography detection through deep learning techniques and motion information , 2016, Neurocomputing.

[88]  Xun Xu,et al.  Zero-Shot Crowd Behavior Recognition , 2019, Group and Crowd Behavior for Computer Vision.

[89]  Lu Su,et al.  SenseGAN , 2018 .

[90]  Shu-Ching Chen,et al.  Multimedia Big Data Analytics , 2018, ACM Comput. Surv..

[91]  Ling Shao,et al.  Performance evaluation of deep feature learning for RGB-D image/video classification , 2017, Inf. Sci..

[92]  J. Arunnehru,et al.  Human Action Recognition using 3D Convolutional Neural Networks with 3D Motion Cuboids in Surveillance Videos , 2018 .

[93]  Xiaochun Luo,et al.  Towards efficient and objective work sampling: Recognizing workers' activities in site surveillance videos with two-stream convolutional networks , 2018, Automation in Construction.

[94]  Ahmad Almogren,et al.  A robust human activity recognition system using smartphone sensors and deep learning , 2018, Future Gener. Comput. Syst..

[95]  Wenjie Lu,et al.  Regional deep learning model for visual tracking , 2016, Neurocomputing.

[96]  Yuke Li,et al.  A Deep Spatiotemporal Perspective for Understanding Crowd Behavior , 2018, IEEE Transactions on Multimedia.

[97]  Alireza Behrad,et al.  Learning an event-oriented and discriminative dictionary based on an adaptive label-consistent K-SVD method for event detection in soccer videos , 2018, J. Vis. Commun. Image Represent..

[98]  Loo Hay Lee,et al.  Enhancing transportation systems via deep learning: A survey , 2019, Transportation Research Part C: Emerging Technologies.

[99]  Tasos Dagiuklas,et al.  Video surveillance systems-current status and future trends , 2017, Comput. Electr. Eng..

[100]  André Bourdoux,et al.  Indoor Person Identification Using a Low-Power FMCW Radar , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[101]  Vania Bogorny,et al.  Toward Abnormal Trajectory and Event Detection in Video Surveillance , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[102]  Felix Wolf,et al.  The Art of Getting Deep Neural Networks in Shape , 2019, ACM Trans. Archit. Code Optim..

[103]  Yuhan Zhang,et al.  Anomalous Sound Detection Using Deep Audio Representation and a BLSTM Network for Audio Surveillance of Roads , 2018, IEEE Access.

[104]  Vassilis S. Kodogiannis,et al.  Mining anomalous events against frequent sequences in surveillance videos from commercial environments , 2012, Expert Syst. Appl..

[105]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[106]  V. Argyriou,et al.  Crowd behavior analysis from fixed and moving cameras , 2019, Multimodal Behavior Analysis in the Wild.

[107]  Aydin Kaya,et al.  Violent activity detection with transfer learning method , 2017 .

[108]  Jing Wang,et al.  Spatio-temporal texture modelling for real-time crowd anomaly detection , 2016, Comput. Vis. Image Underst..

[109]  Yu Cheng,et al.  Unsupervised Sequential Outlier Detection With Deep Architectures , 2017, IEEE Transactions on Image Processing.

[110]  Özgür Ulusoy,et al.  Scenario-based query processing for video-surveillance archives , 2010, Eng. Appl. Artif. Intell..

[111]  Qian Yang,et al.  Pedestrian tracking by learning deep features , 2018, J. Vis. Commun. Image Represent..

[112]  Jan-Michael Frahm,et al.  Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[113]  B. Yogameena,et al.  Computer vision based crowd disaster avoidance system: A survey , 2017 .

[114]  Jinseok Kim,et al.  Deep Neural Network Optimized to Resistive Memory with Nonlinear Current-Voltage Characteristics , 2017, ACM J. Emerg. Technol. Comput. Syst..

[115]  R. Dinesh Jackson Samuel,et al.  Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM , 2019, Comput. Networks.

[116]  Weidong Min,et al.  Support vector machine approach to fall recognition based on simplified expression of human skeleton action and fast detection of start key frame using torso angle , 2018, IET Comput. Vis..

[117]  Debi Prosad Dogra,et al.  Surveillance scene representation and trajectory abnormality detection using aggregation of multiple concepts , 2018, Expert Syst. Appl..

[118]  Xiaojie Guo,et al.  DAAL: Deep activation-based attribute learning for action recognition in depth videos , 2017, Comput. Vis. Image Underst..

[119]  José Luis Espinosa-Aranda,et al.  Fight Recognition in Video Using Hough Forests and 2D Convolutional Neural Network , 2018, IEEE Transactions on Image Processing.

[120]  Gustavo Olague,et al.  Evolving Head Tracking Routines With Brain Programming , 2018, IEEE Access.

[121]  Sung Wook Baik,et al.  Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features , 2018, IEEE Access.

[122]  Daniel Cohen-Or,et al.  P2P-NET , 2018, ACM Trans. Graph..

[123]  Ioannis Patras,et al.  Learning to detect video events from zero or very few video examples , 2015, Image Vis. Comput..