论文信息 - Graph Convolutional Label Noise Cleaner: Train a Plug-And-Play Action Classifier for Anomaly Detection

Graph Convolutional Label Noise Cleaner: Train a Plug-And-Play Action Classifier for Anomaly Detection

Video anomaly detection under weak labels is formulated as a typical multiple-instance learning problem in previous works. In this paper, we provide a new perspective, i.e., a supervised learning task under noisy labels. In such a viewpoint, as long as cleaning away label noise, we can directly apply fully supervised action classifiers to weakly supervised anomaly detection, and take maximum advantage of these well-developed classifiers. For this purpose, we devise a graph convolutional network to correct noisy labels. Based upon feature similarity and temporal consistency, our network propagates supervisory signals from high-confidence snippets to low-confidence ones. In this manner, the network is capable of providing cleaned supervision for action classifiers. During the test phase, we only need to obtain snippet-wise predictions from the action classifier without any extra post-processing. Extensive experiments on 3 datasets at different scales with 2 types of action classifiers demonstrate the efficacy of our method. Remarkably, we obtain the frame-level AUC score of 82.12% on UCF-Crime.

[1] Yong Haur Tay,et al. Abnormal Event Detection in Videos using Spatiotemporal Autoencoder , 2017, ISNN.

[2] Qi Zhang,et al. EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[3] Adam Tauman Kalai,et al. A Note on Learning from Multiple-Instance Examples , 2004, Machine Learning.

[4] Joan Bruna,et al. Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[5] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.

[6] Eric Granger,et al. Multiple instance learning: A survey of problem characteristics and applications , 2016, Pattern Recognit..

[7] Hossein Mobahi,et al. Deep learning from temporal coherence in video , 2009, ICML '09.

[8] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[9] Yale Song,et al. Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10] Trevor Darrell,et al. Auxiliary Image Regularization for Deep CNNs with Noisy Labels , 2015, ICLR.

[11] Gilles Blanchard,et al. Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[12] Jian Pei,et al. Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[13] Tao Zhang,et al. Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector , 2018, ACM Multimedia.

[14] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15] Nicu Sebe,et al. Detecting anomalous events in videos by learning deep representations of appearance and motion , 2017, Comput. Vis. Image Underst..

[16] Kristen Grauman,et al. Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates , 2009, CVPR.

[17] Jennifer Neville,et al. Attributed graph models: modeling network structure with correlated attributes , 2014, WWW.

[18] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[19] Ramakant Nevatia,et al. Cascaded Boundary Regression for Temporal Action Detection , 2017, BMVC.

[20] Shenghua Gao,et al. Future Frame Prediction for Anomaly Detection - A New Baseline , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Cewu Lu,et al. Abnormal Event Detection at 150 FPS in MATLAB , 2013, 2013 IEEE International Conference on Computer Vision.

[22] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24] Louis Kratz,et al. Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models , 2009, CVPR.

[25] Fei-Fei Li,et al. Online detection of unusual events in videos via dynamic sparse coding , 2011, CVPR 2011.

[26] Björn Ommer,et al. Video parsing for abnormality detection , 2011, 2011 International Conference on Computer Vision.

[27] Ramin Mehran,et al. Abnormal crowd behavior detection using social force model , 2009, CVPR.

[28] Shenghua Gao,et al. A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Jure Leskovec,et al. node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[31] Joel H. Saltz,et al. Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Guoqing Liu,et al. Key Instance Detection in Multi-Instance Learning , 2012, ACML.

[33] Mubarak Shah,et al. Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Nicu Sebe,et al. Plug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal Event Detection , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Naftali Tishby,et al. Multi-instance learning with any hypothesis class , 2011, J. Mach. Learn. Res..

[37] Junsong Yuan,et al. Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[38] Lars Kai Hansen,et al. Design of robust neural network classifiers , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[39] Christophe Rosenberger,et al. Abnormal events detection based on spatio-temporal co-occurences , 2009, CVPR.

[40] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[41] Richard Nock,et al. Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Limin Wang,et al. Temporal Action Detection with Structured Segment Networks , 2017, International Journal of Computer Vision.

[43] Shaogang Gong,et al. A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[44] Jacob Goldberger,et al. Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[45] Nuno Vasconcelos,et al. Anomaly Detection and Localization in Crowded Scenes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Sandip S. Patil,et al. Tracking and identification of suspicious and abnormal behaviors using supervised machine learning technique , 2009, ICAC3 '09.

[47] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[48] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[49] Mubarak Shah,et al. Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50] Wen-Hsien Fang,et al. Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[52] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[53] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[54] Hao Wang,et al. Learning with Noisy Labels for Sentence-level Sentiment Classification , 2019, EMNLP.

[55] Limin Wang,et al. Temporal Segment Networks for Action Recognition in Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56] Mubarak Shah,et al. Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57] Lei Zhang,et al. AutoLoc: Weakly-supervised Temporal Action Localization , 2018, ECCV.

[58] Xu Zhao,et al. Single Shot Temporal Action Detection , 2017, ACM Multimedia.

[59] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[60] Nuno Vasconcelos,et al. Anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[62] Yang Gao,et al. Abnormal Event Detection via Multi-Instance Dictionary Learning , 2012, IDEAL.

[63] Xavier Bresson,et al. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[64] Jonghyun Choi,et al. Learning Temporal Regularity in Video Sequences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65] Yusha Liu,et al. Classifier Two Sample Test for Video Anomaly Detections , 2018, BMVC.

[66] Arash Vahdat,et al. Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[67] Ehud Rivlin,et al. Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68] Alessandro Perina,et al. Angry Crowds: Detecting Violent Events in Videos , 2016, ECCV.

[69] R. Bellman. Dynamic programming. , 1957, Science.

[70] Shenghua Gao,et al. Remembering history with convolutional LSTM for anomaly detection , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[71] Richard S. Zemel,et al. Gated Graph Sequence Neural Networks , 2015, ICLR.

[72] Nannan Li,et al. Anomaly Detection in Video Surveillance via Gaussian Process , 2015, Int. J. Pattern Recognit. Artif. Intell..

[73] Jie Shao,et al. An anomaly-introduced learning method for abnormal event detection , 2018, Multimedia Tools and Applications.

[74] Nannan Li,et al. Multi-scale analysis of contextual information within spatio-temporal video volumes for anomaly detection , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[75] Kristen Grauman,et al. Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).