Activity based surveillance video content modelling

This paper tackles the problem of surveillance video content modelling. Given a set of surveillance videos, the aims of our work are twofold: firstly a continuous video is segmented according to the activities captured in the video; secondly a model is constructed for the video content, based on which an unseen activity pattern can be recognised and any unusual activities can be detected. To segment a video based on activity, we propose a semantically meaningful video content representation method and two segmentation algorithms, one being offline offering high accuracy in segmentation, and the other being online enabling real-time performance. Our video content representation method is based on automatically detected visual events (i.e. 'what is happening in the scene'). This is in contrast to most previous approaches which represent video content at the signal level using image features such as colour, motion and texture. Our segmentation algorithms are based on detecting breakpoints on a high-dimensional video content trajectory. This differs from most previous approaches which are based on shot change detection and shot grouping. Having segmented continuous surveillance videos based on activity, the activity patterns contained in the video segments are grouped into activity classes and a composite video content model is constructed which is capable of generalising from a small training set to accommodate variations in unseen activity patterns. A run-time accumulative unusual activity measure is introduced to detect unusual behaviour while usual activity patterns are recognised based on an online likelihood ratio test (LRT) method. This ensures robust and reliable activity recognition and unusual activity detection at the shortest possible time once sufficient visual evidence has become available. Comparative experiments have been carried out using over 10h of challenging outdoor surveillance video footages to evaluate the proposed segmentation algorithms and modelling approach.

[1]  Shaogang Gong,et al.  Learning pixel-wise signal energy for understanding semantics , 2003, Image Vis. Comput..

[2]  Nuno Vasconcelos,et al.  Bayesian modeling of video editing and structure: semantic features for video summarization and browsing , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[3]  Padhraic Smyth,et al.  Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching , 2000 .

[4]  Shaogang Gong,et al.  Quantifying Temporal Saliency , 2004, BMVC.

[5]  Shaogang Gong,et al.  Automated Detection of Localised Visual Events Over Varying Temporal Scales , 2002 .

[6]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Azriel Rosenfeld,et al.  Tracking Groups of People , 2000, Comput. Vis. Image Underst..

[9]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[12]  Clement H. C. Leung,et al.  Advances in Visual Information Systems, 9th International Conference, VISUAL 2007, Shanghai, China, June 28-29, 2007 Revised Selected Papers , 2007, VISUAL.

[13]  Anthony Lewis Brooks,et al.  SoundScapes: non-formal learning potentials from interactive VEs , 2007, SIGGRAPH '07.

[14]  Hayit Greenspan,et al.  Probabilistic space-time video modeling via piecewise GMM , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  William I. Grosky,et al.  Negotiating the semantic gap: from feature maps to semantic landscapes , 2001, Pattern Recognit..

[16]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[19]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  K. Ramchandran,et al.  A factor graph framework for semantic indexing and retrieval in video , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[21]  Thomas S. Huang,et al.  Exploring video structure beyond the shots , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[22]  Joseph B. Kruskall,et al.  The Symmetric Time-Warping Problem : From Continuous to Discrete , 1983 .

[23]  Shaogang Gong,et al.  Video behaviour profiling and abnormality detection without manual labelling , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[24]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[25]  Shaogang Gong,et al.  Recognition of group activities using dynamic probabilistic networks , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Peter G. B. Enser,et al.  Towards a Comprehensive Survey of the Semantic Gap in Visual Image Retrieval , 2003, CIVR.

[27]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[28]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[29]  Chong-Wah Ngo,et al.  Motion-Based Video Representation for Scene Change Detection , 2004, International Journal of Computer Vision.

[30]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[31]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[32]  Longin Jan Latecki,et al.  Convexity Rule for Shape Decomposition Based on Discrete Contour Evolution , 1999, Comput. Vis. Image Underst..

[33]  M. Irani,et al.  Event-Based Video Analysis, , 2001 .

[34]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[35]  J. Crowley,et al.  Multi-Modal Tracking of Interacting Targets Using Gaussian Approximations , 2001 .

[36]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[37]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[39]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[40]  Paul S. Heckbert,et al.  Survey of Polygonal Surface Simplification Algorithms , 1997 .

[41]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Noboru Babaguchi,et al.  Event based indexing of broadcasted sports video by intermodal collaboration , 2002, IEEE Trans. Multim..

[43]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[44]  Chong-Wah Ngo,et al.  Motion-Based Video Representation for Scene Change Detection , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[45]  Azriel Rosenfeld,et al.  Relevance Ranking of Video Data using Hidden Markov Model Distances and Polygon Simplification , 2000, VISUAL.

[46]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[48]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[49]  Shaogang Gong,et al.  Autonomous Visual Events Detection and Classification without Explicit Object-Centred Segmentation and Tracking , 2002, BMVC.

[50]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[51]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.