Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge

This paper addresses the problem of joint detection and recounting of abnormal events in videos. Recounting of abnormal events, i.e., explaining why they are judged to be abnormal, is an unexplored but critical task in video surveillance, because it helps human observers quickly judge if they are false alarms or not. To describe the events in the human-understandable form for event recounting, learning generic knowledge about visual concepts (e.g., object and action) is crucial. Although convolutional neural networks (CNNs) have achieved promising results in learning such concepts, it remains an open question as to how to effectively use CNNs for abnormal event detection, mainly due to the environment-dependent nature of the anomaly detection. In this paper, we tackle this problem by integrating a generic CNN model and environment-dependent anomaly detectors. Our approach first learns CNN with multiple visual tasks to exploit semantic information that is useful for detecting and recounting abnormal events. By appropriately plugging the model into anomaly detectors, we can detect and recount abnormal events while taking advantage of the discriminative power of CNNs. Our approach outperforms the state-of-the-art on Avenue and UCSD Ped2 benchmarks for abnormal event detection and also produces promising results of abnormal event recounting.

[1]  Björn Ommer,et al.  Video parsing for abnormality detection , 2011, 2011 International Conference on Computer Vision.

[2]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Hui Cheng,et al.  Multimedia event recounting with concept based representation , 2012, ACM Multimedia.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Nicu Sebe,et al.  Learning Deep Representations of Appearance and Motion for Anomalous Event Detection , 2015, BMVC.

[6]  Jiaxuan Wang,et al.  HICO: A Benchmark for Recognizing Human-Object Interactions in Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Wen-Hsien Fang,et al.  Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jonghyun Choi,et al.  Learning Temporal Regularity in Video Sequences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[12]  Stephen J. Maybank,et al.  Fusion of Multiple Tracking Algorithms for Robust People Tracking , 2002, ECCV.

[13]  Nuno Vasconcelos,et al.  Anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Yongdong Zhang,et al.  Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[16]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[18]  Jitendra Malik,et al.  Learning to segment moving objects in videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Tianzhu Zhang,et al.  Learning semantic scene models by object classification and trajectory clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Christophe Rosenberger,et al.  Abnormal events detection based on spatio-temporal co-occurences , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Dong Liu,et al.  Recognizing Complex Events in Videos by Learning Key Static-Dynamic Evidences , 2014, ECCV.

[23]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[24]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26]  L. Kratz,et al.  Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Fei-Fei Li,et al.  Online detection of unusual events in videos via dynamic sparse coding , 2011, CVPR 2011.

[28]  Nuno Vasconcelos,et al.  Anomaly Detection and Localization in Crowded Scenes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Raffaella Bernardi,et al.  TUHOI: Trento Universal Human Object Interaction Dataset , 2014, VL@COLING.

[30]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[31]  Mubarak Shah,et al.  Learning object motion patterns for anomaly detection and improved object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Mubarak Shah,et al.  Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Junsong Yuan,et al.  Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[34]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Mahmood Fathy,et al.  Real-time anomaly detection and localization in crowded scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Cewu Lu,et al.  Abnormal Event Detection at 150 FPS in MATLAB , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Martial Hebert,et al.  A Discriminative Framework for Anomaly Detection in Large Videos , 2016, ECCV.

[38]  Aggelos K. Katsaggelos,et al.  Anomalous video event detection using spatiotemporal context , 2011 .

[39]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[40]  Venkatesh Saligrama,et al.  Video anomaly detection based on local statistical aggregates , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  K. Grauman,et al.  Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).