Graph Convolutional Label Noise Cleaner: Train a Plug-And-Play Action Classifier for Anomaly Detection

Video anomaly detection under weak labels is formulated as a typical multiple-instance learning problem in previous works. In this paper, we provide a new perspective, i.e., a supervised learning task under noisy labels. In such a viewpoint, as long as cleaning away label noise, we can directly apply fully supervised action classifiers to weakly supervised anomaly detection, and take maximum advantage of these well-developed classifiers. For this purpose, we devise a graph convolutional network to correct noisy labels. Based upon feature similarity and temporal consistency, our network propagates supervisory signals from high-confidence snippets to low-confidence ones. In this manner, the network is capable of providing cleaned supervision for action classifiers. During the test phase, we only need to obtain snippet-wise predictions from the action classifier without any extra post-processing. Extensive experiments on 3 datasets at different scales with 2 types of action classifiers demonstrate the efficacy of our method. Remarkably, we obtain the frame-level AUC score of 82.12% on UCF-Crime.

[1]  Yong Haur Tay,et al.  Abnormal Event Detection in Videos using Spatiotemporal Autoencoder , 2017, ISNN.

[2]  Qi Zhang,et al.  EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[3]  Adam Tauman Kalai,et al.  A Note on Learning from Multiple-Instance Examples , 2004, Machine Learning.

[4]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[5]  Abhinav Gupta,et al.  Videos as Space-Time Region Graphs , 2018, ECCV.

[6]  Eric Granger,et al.  Multiple instance learning: A survey of problem characteristics and applications , 2016, Pattern Recognit..

[7]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[8]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[9]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Trevor Darrell,et al.  Auxiliary Image Regularization for Deep CNNs with Noisy Labels , 2015, ICLR.

[11]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[12]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[13]  Tao Zhang,et al.  Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector , 2018, ACM Multimedia.

[14]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Nicu Sebe,et al.  Detecting anomalous events in videos by learning deep representations of appearance and motion , 2017, Comput. Vis. Image Underst..

[16]  Kristen Grauman,et al.  Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates , 2009, CVPR.

[17]  Jennifer Neville,et al.  Attributed graph models: modeling network structure with correlated attributes , 2014, WWW.

[18]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[19]  Ramakant Nevatia,et al.  Cascaded Boundary Regression for Temporal Action Detection , 2017, BMVC.

[20]  Shenghua Gao,et al.  Future Frame Prediction for Anomaly Detection - A New Baseline , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Cewu Lu,et al.  Abnormal Event Detection at 150 FPS in MATLAB , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Louis Kratz,et al.  Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models , 2009, CVPR.

[25]  Fei-Fei Li,et al.  Online detection of unusual events in videos via dynamic sparse coding , 2011, CVPR 2011.

[26]  Björn Ommer,et al.  Video parsing for abnormality detection , 2011, 2011 International Conference on Computer Vision.

[27]  Ramin Mehran,et al.  Abnormal crowd behavior detection using social force model , 2009, CVPR.

[28]  Shenghua Gao,et al.  A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[31]  Joel H. Saltz,et al.  Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Guoqing Liu,et al.  Key Instance Detection in Multi-Instance Learning , 2012, ACML.

[33]  Mubarak Shah,et al.  Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Nicu Sebe,et al.  Plug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal Event Detection , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Naftali Tishby,et al.  Multi-instance learning with any hypothesis class , 2011, J. Mach. Learn. Res..

[37]  Junsong Yuan,et al.  Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[38]  Lars Kai Hansen,et al.  Design of robust neural network classifiers , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[39]  Christophe Rosenberger,et al.  Abnormal events detection based on spatio-temporal co-occurences , 2009, CVPR.

[40]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[41]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Limin Wang,et al.  Temporal Action Detection with Structured Segment Networks , 2017, International Journal of Computer Vision.

[43]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[44]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[45]  Nuno Vasconcelos,et al.  Anomaly Detection and Localization in Crowded Scenes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Sandip S. Patil,et al.  Tracking and identification of suspicious and abnormal behaviors using supervised machine learning technique , 2009, ICAC3 '09.

[47]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[48]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Mubarak Shah,et al.  Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Wen-Hsien Fang,et al.  Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[52]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[53]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Hao Wang,et al.  Learning with Noisy Labels for Sentence-level Sentiment Classification , 2019, EMNLP.

[55]  Limin Wang,et al.  Temporal Segment Networks for Action Recognition in Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Mubarak Shah,et al.  Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Lei Zhang,et al.  AutoLoc: Weakly-supervised Temporal Action Localization , 2018, ECCV.

[58]  Xu Zhao,et al.  Single Shot Temporal Action Detection , 2017, ACM Multimedia.

[59]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[60]  Nuno Vasconcelos,et al.  Anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[62]  Yang Gao,et al.  Abnormal Event Detection via Multi-Instance Dictionary Learning , 2012, IDEAL.

[63]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[64]  Jonghyun Choi,et al.  Learning Temporal Regularity in Video Sequences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Yusha Liu,et al.  Classifier Two Sample Test for Video Anomaly Detections , 2018, BMVC.

[66]  Arash Vahdat,et al.  Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[67]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Alessandro Perina,et al.  Angry Crowds: Detecting Violent Events in Videos , 2016, ECCV.

[69]  R. Bellman Dynamic programming. , 1957, Science.

[70]  Shenghua Gao,et al.  Remembering history with convolutional LSTM for anomaly detection , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[71]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[72]  Nannan Li,et al.  Anomaly Detection in Video Surveillance via Gaussian Process , 2015, Int. J. Pattern Recognit. Artif. Intell..

[73]  Jie Shao,et al.  An anomaly-introduced learning method for abnormal event detection , 2018, Multimedia Tools and Applications.

[74]  Nannan Li,et al.  Multi-scale analysis of contextual information within spatio-temporal video volumes for anomaly detection , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[75]  Kristen Grauman,et al.  Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).