Video anomaly detection with multi-scale feature and temporal information fusion

Abstract Video anomaly detection is a challenging task because of the uncertainty of abnormal events. The current method based on predictive frames has obtained better detection results compared with the previous reconstruction or hand-crafted methods. In current prediction methods, the characteristics considered previously are only of a single scale, and the time constraint information is not fully used. In our work, we proposed a new framework structure to achieve better abnormality detection rate. To address the objects of different scales in each video frame, we considered extracting the characteristics of different receptive fields to encode more spatial information. At the same time, we added temporal constraints to the network instead of using time-consuming optical flow information, and we completed the memory of temporal features through a ConvGRU module. Furthermore, while distinguishing abnormal events, we also considered temporal information and spatial information so that our framework could fully combine spatio-temporal information to correctly distinguish abnormal events from normal events. We obtained excellent results on three datasets, thus demonstrating the effectiveness of our method.

[1]  Nanjun Li,et al.  Video anomaly detection and localization via multivariate gaussian fully convolution adversarial autoencoder , 2019, Neurocomputing.

[2]  Yuanyuan Li,et al.  Spatio-Temporal Unity Networking for Video Anomaly Detection , 2019, IEEE Access.

[3]  Xing Hu,et al.  Squirrel-Cage Local Binary Pattern and Its Application in Video Anomaly Detection , 2019, IEEE Transactions on Information Forensics and Security.

[4]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[5]  Junsong Yuan,et al.  Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[6]  En Zhu,et al.  Video anomaly detection and localization by local motion based joint video representation and OCELM , 2018, Neurocomputing.

[7]  Mahmood Fathy,et al.  Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes , 2016, Comput. Vis. Image Underst..

[8]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[9]  Chang-Tsun Li,et al.  Video Anomaly Detection With Compact Feature Sets for Online Performance , 2017, IEEE Transactions on Image Processing.

[10]  Xinghao Jiang,et al.  Anomaly Detection Based on Stacked Sparse Coding With Intraframe Classification Strategy , 2018, IEEE Transactions on Multimedia.

[11]  Mahmood Fathy,et al.  Deep-Cascade: Cascading 3D Deep Neural Networks for Fast Anomaly Detection and Localization in Crowded Scenes , 2017, IEEE Transactions on Image Processing.