Remembering history with convolutional LSTM for anomaly detection

This paper tackles anomaly detection in videos, which is an extremely challenging task because anomaly is unbounded. We approach this task by leveraging a Convolutional Neural Network (CNN or ConvNet) for appearance encoding for each frame, and leveraging a Convolutional Long Short Term Memory (ConvLSTM) for memorizing all past frames which corresponds to the motion information. Then we integrate ConvNet and ConvLSTM with Auto-Encoder, which is referred to as ConvLSTM-AE, to learn the regularity of appearance and motion for the ordinary moments. Compared with 3D Convolutional Auto-Encoder based anomaly detection, our main contribution lies in that we propose a ConvLSTM-AE framework which better encodes the change of appearance and motion for normal events, respectively. To evaluate our method, we first conduct experiments on a synthesized Moving-MNIST dataset under controlled settings, and results show that our method can easily identify the change of appearance and motion. Extensive experiments on real anomaly datasets further validate the effectiveness of our method for anomaly detection.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[4]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[6]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[7]  Samy Bengio,et al.  Semi-supervised adapted HMMs for unusual event detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Nicu Sebe,et al.  Learning Deep Representations of Appearance and Motion for Anomalous Event Detection , 2015, BMVC.

[10]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Kejun Wang,et al.  Video-Based Abnormal Human Behavior Recognition—A Review , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[14]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[15]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Cewu Lu,et al.  Abnormal Event Detection at 150 FPS in MATLAB , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Martial Hebert,et al.  A Discriminative Framework for Anomaly Detection in Large Videos , 2016, ECCV.

[18]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[19]  Fei-Fei Li,et al.  Online detection of unusual events in videos via dynamic sparse coding , 2011, CVPR 2011.

[20]  Jonghyun Choi,et al.  Learning Temporal Regularity in Video Sequences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Nuno Vasconcelos,et al.  Anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.