Abnormal Event Detection in Videos using Spatiotemporal Autoencoder

We present an efficient method for detecting anomalies in videos. Recent applications of convolutional neural networks have shown promises of convolutional layers for object detection and recognition, especially in images. However, convolutional neural networks are supervised and require labels as learning signals. We propose a spatiotemporal architecture for anomaly detection in videos including crowded scenes. Our architecture includes two main components, one for spatial feature representation, and one for learning the temporal evolution of the spatial features. Experimental results on Avenue, Subway and UCSD benchmarks confirm that the detection accuracy of our method is comparable to state-of-the-art methods at a considerable speed of up to 140 fps.

[1]  Ramin Mehran,et al.  Abnormal crowd behavior detection using social force model , 2009, CVPR.

[2]  Junsong Yuan,et al.  Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[3]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Luca Viganò,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2015, IWSEC 2015.

[5]  Hongbin Zha,et al.  Anomaly Detection via Local Coordinate Factorization and Spatio-Temporal Pyramid , 2014, ACCV.

[6]  Jefferson Ryan Medel,et al.  Anomaly Detection Using Predictive Convolutional Long Short-Term Memory Units , 2016 .

[7]  Raja Bala,et al.  Adaptive Sparse Representations for Video Anomaly Detection , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Kristen Grauman,et al.  Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates , 2009, CVPR.

[9]  Wei Shen,et al.  Unusual event detection in crowded scenes by trajectory analysis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Viorica Patraucean,et al.  Spatio-temporal video autoencoder with differentiable memory , 2015, ArXiv.

[11]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Chun-Hui Wang,et al.  Abnormal Event Detection Using HOSF , 2013, 2013 International Conference on IT Convergence and Security (ICITCS).

[13]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[15]  Louis Kratz,et al.  Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models , 2009, CVPR.

[16]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[17]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Mahmood Fathy,et al.  Real-time anomaly detection and localization in crowded scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Cewu Lu,et al.  Abnormal Event Detection at 150 FPS in MATLAB , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Wei Shen,et al.  Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes , 2016, Signal Process. Image Commun..

[22]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Martin D. Levine,et al.  An on-line, real-time learning method for detecting anomalies in videos using spatio-temporal compositions , 2013, Comput. Vis. Image Underst..

[24]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Ivan Laptev,et al.  Context-Aware CNNs for Person Head Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Tian Wang,et al.  Histograms of optical flow orientation for abnormal events detection , 2013, 2013 IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS).

[28]  Fei-Fei Li,et al.  Online detection of unusual events in videos via dynamic sparse coding , 2011, CVPR 2011.

[29]  Qixiang Ye,et al.  Abnormal Behavior Detection via Sparse Reconstruction Analysis of Trajectory , 2011, 2011 Sixth International Conference on Image and Graphics.

[30]  Nuno Vasconcelos,et al.  Anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Gian Luca Foresti,et al.  Trajectory-Based Anomalous Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Brian C. Lovell,et al.  Improved anomaly detection in crowded scenes via cell-based analysis of foreground speed, size and texture , 2011, CVPR 2011 WORKSHOPS.

[33]  D. Pokrajac,et al.  2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS) , 2005, Research in Microelectronics and Electronics, 2005 PhD.

[34]  Bonny Banerjee,et al.  Online Detection of Abnormal Events Using Incremental Coding Length , 2015, AAAI.

[35]  Jonghyun Choi,et al.  Learning Temporal Regularity in Video Sequences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).