Abnormal Behavior Detection of ATM Surveillance Videos Based on Pseudo-3D Residual Network

As a major source of social big data, surveillance videos of self-service banks have been an important part of social security. Automated video analysis is required since manual screening of these video streams is costly and slow. Among current automated video analysis methods, automated 3D convolution networks are effective on video analysis but consume exceptional time and memory overheads. In this paper, we propose to apply Pseudo-3D Residual network (P3D ResNet) to replace 3*3*3 convolution, which can reduce the amount of calculation. To improve the accuracy of P3D ResNet on our videos, we further preprocess the videos by extracting human pose information with ‘openpose’ tools. With the P3D ResNet model and ‘openpose’ preprocessing tools, we build an abnormal behavior detection prototype for self-service banks surveillance videos. Due to the help of the pose information, the accuracy of behavior classification of video from banks is improved by at least 7%, which cannot be disclosed to the public. To ensure the audience can repeat our experiments, we further evaluated our prototyping system with self-made videos emulating bank settings to show that the accuracy of our system is improved by 11%.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[5]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Haoyu Wang,et al.  Pose Flow: Efficient Online Pose Tracking , 2018, BMVC.

[7]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).