Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders

Visual perception is a transformative technology that can recognize patterns from environments through visual inputs. Automatic surveillance of human activities has gained significant importance in both public and private spaces. It is often difficult to understand the complex dynamics of events in real-time scenarios due to camera movements, cluttered backgrounds, and occlusion. Existing anomaly detection systems are not efficient because of high intra-class variations and inter-class similarities existing among activities. Hence, there is a demand to explore different kinds of information extracted from surveillance videos to improve overall performance. This can be achieved by learning features from multiple forms (views) of the given raw input data. We propose two novel methods based on the multi-view representation learning framework. The first approach is a hybrid multi-view representation learning that combines deep features extracted from 3D spatiotemporal autoencoder (3D-STAE) and robust handcrafted features based on spatiotemporal autocorrelation of gradients. The second approach is a deep multi-view representation learning that combines deep features extracted from two-stream STAEs to detect anomalies. Results on three standard benchmark datasets, namely Avenue, Live Videos, and BEHAVE, show that the proposed multi-view representations modeled with one-class SVM perform significantly better than most of the recent state-of-the-art methods.

[1]  Rahul Sukthankar,et al.  Violence Detection in Video Using Computer Vision Techniques , 2011, CAIP.

[2]  Chen Shen,et al.  Spatio-Temporal AutoEncoder for Video Anomaly Detection , 2017, ACM Multimedia.

[3]  Aditya Khamparia,et al.  An Integrated Hybrid CNN–RNN Model for Visual Description and Generation of Captions , 2020, Circuits Syst. Signal Process..

[4]  Hongwei Liu,et al.  A Review of the Autoencoder and Its Variants: A Comparative Perspective from Target Recognition in Synthetic-Aperture Radar Images , 2018, IEEE Geoscience and Remote Sensing Magazine.

[5]  Xiangjian He,et al.  Discriminative Dictionary Learning With Motion Weber Local Descriptor for Violence Detection , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Oswald Lanz,et al.  Learning to detect violent videos using convolutional long short-term memory , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[7]  Deng Cai,et al.  Sparse Coding Guided Spatiotemporal Feature Learning for Abnormal Event Detection in Large Videos , 2019, IEEE Transactions on Multimedia.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Weihai Li,et al.  A Novel Framework for Anomaly Detection in Video Surveillance Using Multi-feature Extraction , 2016, 2016 9th International Symposium on Computational Intelligence and Design (ISCID).

[10]  Takumi Kobayashi,et al.  Motion recognition using local auto-correlation of space-time gradients , 2012, Pattern Recognit. Lett..

[11]  Radu Tudor Ionescu,et al.  Detecting Abnormal Events in Video Using Narrowed Normality Clusters , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Radu Tudor Ionescu,et al.  Deep Appearance Features for Abnormal Behavior Detection in Video , 2017, ICIAP.

[13]  Ming Yang,et al.  A Survey of Multi-View Representation Learning , 2019, IEEE Transactions on Knowledge and Data Engineering.

[14]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[15]  Shiliang Sun,et al.  Multi-view learning overview: Recent progress and new challenges , 2017, Inf. Fusion.

[16]  Qiang Liu,et al.  Detecting Abnormality without Knowing Normality: A Two-stage Approach for Unsupervised Video Abnormal Event Detection , 2018, ACM Multimedia.

[17]  Muhammad Usman Ghani Khan,et al.  Toward Generating Human-Centered Video Annotations , 2020, Circuits Syst. Signal Process..

[18]  C. Anna Palagan,et al.  An Efficient Optimal Neural Network-Based Moving Vehicle Detection in Traffic Video Surveillance System , 2019, Circuits, Systems, and Signal Processing.

[19]  Chang-Tsun Li,et al.  The LV dataset: A realistic surveillance video dataset for abnormal event detection , 2017, 2017 5th International Workshop on Biometrics and Forensics (IWBF).

[20]  Bonny Banerjee,et al.  Online Detection of Abnormal Events Using Incremental Coding Length , 2015, AAAI.

[21]  Shenghua Gao,et al.  Future Frame Prediction for Anomaly Detection - A New Baseline , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Giulio Sandini,et al.  Exploring Biological Motion Regularities of Human Actions , 2017, ACM Trans. Appl. Percept..

[23]  R. Venkatesh Babu,et al.  Real time anomaly detection in H.264 compressed videos , 2013, 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG).

[24]  Radu Tudor Ionescu,et al.  Unmasking the Abnormal Events in Video , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Yong Liu,et al.  AnomalyNet: An Anomaly Detection Network for Video Surveillance , 2019, IEEE Transactions on Information Forensics and Security.

[26]  Michelle R. Greene,et al.  Assessing Neural Network Scene Classification from Degraded Images , 2019, ACM Trans. Appl. Percept..

[27]  Nicu Sebe,et al.  Abnormal event detection in videos using generative adversarial nets , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[28]  Nicu Sebe,et al.  Learning Deep Representations of Appearance and Motion for Anomalous Event Detection , 2015, BMVC.

[29]  Jonghyun Choi,et al.  Learning Temporal Regularity in Video Sequences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Svetha Venkatesh,et al.  Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Shenghua Gao,et al.  A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[33]  Xing Hu,et al.  Video anomaly detection using deep incremental slow feature analysis network , 2016, IET Comput. Vis..

[34]  Cewu Lu,et al.  Abnormal Event Detection at 150 FPS in MATLAB , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Graham W. Taylor,et al.  Real-Time End-to-End Action Detection with Two-Stream Networks , 2018, 2018 15th Conference on Computer and Robot Vision (CRV).

[36]  Mohamed Hammami,et al.  Abnormal Events Detection Based on Trajectory Clustering , 2016, 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV).

[37]  Chong-Min Kyung,et al.  Rejecting Motion Outliers for Efficient Crowd Anomaly Detection , 2019, IEEE Transactions on Information Forensics and Security.

[38]  Mahmood Fathy,et al.  Deep-Cascade: Cascading 3D Deep Neural Networks for Fast Anomaly Detection and Localization in Crowded Scenes , 2017, IEEE Transactions on Image Processing.

[39]  David C. Hogg,et al.  Anomaly Detection using a Convolutional Winner-Take-All Autoencoder , 2017, BMVC.

[40]  XuXin,et al.  Multi-view learning overview , 2017 .

[41]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[42]  Naixue Xiong,et al.  Abnormal event detection in crowded scenes based on deep learning , 2016, Multimedia Tools and Applications.

[43]  Wen-Hsien Fang,et al.  Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Martial Hebert,et al.  A Discriminative Framework for Anomaly Detection in Large Videos , 2016, ECCV.

[45]  Hong Liu,et al.  Online growing neural gas for anomaly detection in changing surveillance scenes , 2017, Pattern Recognit..

[46]  S. Selva Nidhyananthan,et al.  3D Facial Expression Recognition Using Multi-channel Deep Learning Framework , 2020, Circuits Syst. Signal Process..

[47]  Huchuan Lu,et al.  Video anomaly detection based on locality sensitive hashing filters , 2016, Pattern Recognit..

[48]  Mubarak Shah,et al.  Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Michael G. Strintzis,et al.  Swarm-based motion features for anomaly detection in crowds , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[50]  Chang-Tsun Li,et al.  Abnormal event detection in videos using binary features , 2017, 2017 40th International Conference on Telecommunications and Signal Processing (TSP).

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Robert B. Fisher,et al.  The BEHAVE video dataset: ground truthed video for multi-person behavior classification , 2010 .