An Efficient and Robust System for Multiperson Event Detection in Real-World Indoor Surveillance Scenes

Due to the popularity of security cameras in public places, it is of interest to design an intelligent system that can efficiently detect events automatically. This paper proposes a novel algorithm for multiperson event detection. To ensure greater than real-time performance, features are extracted directly from compressed MPEG video. A novel histogram-based feature descriptor that captures the angles between extracted particle trajectories is proposed, which allows us to capture motion patterns for multiperson events in the video. To alleviate the need for fine-grained annotation, we propose the use of labeled latent Dirichlet allocation, a weakly supervised method that allows the use of coarse temporal annotations, which are much simpler to obtain. This novel system is able to run at ~10 times real time, while preserving state-of-the-art detection performance for multiperson events on a 100-h real-world surveillance data set (TRECVid surveillance event detection).

[1]  Xiaogang Wang,et al.  Random field topic model for semantic region analysis in crowded scenes from tracklets , 2011, CVPR 2011.

[2]  Li-Qun Xu Issues in video analytics and surveillance systems: Research / prototyping vs. applications / user requirements , 2007, AVSS.

[3]  Sharath Pankanti,et al.  Spatio-temporal fisher vector coding for surveillance event detection , 2013, ACM Multimedia.

[4]  Louis Kratz,et al.  Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models , 2009, CVPR.

[5]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ramin Mehran,et al.  Abnormal crowd behavior detection using social force model , 2009, CVPR.

[7]  Sridha Sridharan,et al.  Dynamic texture reconstruction from sparse codes for unusual event detection in crowded scenes , 2011, J-MRE '11.

[8]  Rafael C. González,et al.  Digital image processing using MATLAB , 2006 .

[9]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[11]  W. Eric L. Grimson,et al.  Trajectory Analysis and Semantic Region Modeling Using Nonparametric Hierarchical Bayesian Models , 2011, International Journal of Computer Vision.

[12]  Sridha Sridharan,et al.  Activity Analysis in Complicated Scenes Using DFT Coefficients of Particle Trajectories , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[13]  Ming-Ting Sun,et al.  Automatic video activity detection using compressed domain motion trajectories for H.264 videos , 2011, J. Vis. Commun. Image Represent..

[14]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[15]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  W. W. Armstrong,et al.  Dependency Structures of Data Base Relationships , 1974, IFIP Congress.

[17]  P. N. Tudor MPEG-2 video compression , 1995 .

[18]  Sridha Sridharan,et al.  SAIVT-QUT@TRECVid 2012: Interactive surveillance event detection , 2012 .

[19]  Kuo-Chin Fan,et al.  Motion Flow-Based Video Retrieval , 2007, IEEE Transactions on Multimedia.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[23]  Tao Xiang,et al.  Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Nuno Vasconcelos,et al.  Biologically Inspired Object Tracking Using Center-Surround Saliency Mechanisms , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Ivan Laptev,et al.  Data-driven crowd analysis in videos , 2011, ICCV.

[26]  Carlo S. Regazzoni,et al.  Selective attention automatic focus for cognitive crowd monitoring , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[27]  Adel M. Alimi,et al.  Incremental Learning Approach for Events Detection from Large Video Dataset , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[28]  R. Venkatesh Babu,et al.  Compressed domain human action recognition in H.264/AVC video streams , 2014, Multimedia Tools and Applications.

[29]  Sharath Pankanti,et al.  Video surveillance: past, present, and now the future [DSP Forum] , 2013, IEEE Signal Processing Magazine.

[30]  Paul Over,et al.  TRECVID 2009 -- Goals, Tasks, Data, Evaluation Mechanisms and Metrics | NIST , 2010 .

[31]  Nuno Vasconcelos,et al.  Anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Ling-Yu Duan,et al.  ESUR: A system for Events detection in SURveillance video , 2010, 2010 IEEE International Conference on Image Processing.

[34]  Jens Rannacher,et al.  Realtime 3 D Motion Estimation on Graphics Hardware , 2010 .

[35]  Ming Yang,et al.  Detecting Human Actions in Surveillance Videos , 2009, TRECVID.

[36]  Mubarak Shah,et al.  A Lagrangian Particle Dynamics Approach for Crowd Flow Segmentation and Stability Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Mubarak Shah,et al.  A Streakline Representation of Flow in Crowded Scenes , 2010, ECCV.

[38]  Sridha Sridharan,et al.  Textures of optical flow for real-time anomaly detection in crowds , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[39]  Mubarak Shah,et al.  Identifying Behaviors in Crowd Scenes Using Stability Analysis for Dynamical Systems , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Paul Over,et al.  The TRECVid 2008 Event Detection evaluation , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[41]  Ying Wu,et al.  Discriminative Video Pattern Search for Efficient Action Detection , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Carlo S. Regazzoni,et al.  Bio-inspired relevant interaction modelling in cognitive crowd management , 2015, J. Ambient Intell. Humaniz. Comput..

[43]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Kim M. Hazelwood,et al.  Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[45]  Wen Gao,et al.  Modeling Background and Segmenting Moving Objects from Compressed Video , 2008, IEEE Transactions on Circuits and Systems for Video Technology.