Composite behavior analysis for video surveillance using hierarchical dynamic Bayesian networks

Analyzing composite behaviors involving objects from multiple categories in surveillance videos is a challenging task due to the complicated relationships among human and objects. This paper presents a novel behavior analysis framework using a hierarchical dynamic Bayesian network (DBN) for video surveillance systems. The model is built for extracting objects' behaviors and their relationships by representing behaviors using spatial-temporal characteristics. The recognition of object behaviors is processed by the DBN at multiple levels: features of objects at low level, objects and their relationships at middle level, and event at high level, where event refers to behaviors of a single type object as well as behaviors consisting of several types of objects such as "a person getting in a car." Furthermore, to reduce the complexity, a simple model selection criterion is addressed, by which the appropriated model is picked out from a pool of candidate models. Experiments are shown to demonstrate that the proposed framework could efficiently recognize and semantically describe composite object and human activities in surveillance videos.

[1]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  G. Colqui,et al.  New video synthesis based on flocking behavior simulation , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[3]  Jesse Hoey,et al.  Value-Directed Human Behavior Analysis from Video Using Partially Observable Markov Decision Processes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Runsheng Wang,et al.  Improved algorithms for motion detection and tracking , 2006 .

[5]  Kuntal Sengupta,et al.  Framework for real-time behavior interpretation from traffic video , 2005, IEEE Transactions on Intelligent Transportation Systems.

[6]  Shaogang Gong,et al.  Activity Based Video Content Trajectory Representation and Segmentation , 2004, BMVC.

[7]  Joachim M. Buhmann,et al.  Seeing the Objects Behind the Dots: Recognition in Videos from a Moving Camera , 2009, International Journal of Computer Vision.

[8]  Shaogang Gong,et al.  Model Selection for Unsupervised Learning of Visual Context , 2006, International Journal of Computer Vision.

[9]  Chung-Lin Huang,et al.  Semantic analysis of soccer video using dynamic Bayesian network , 2006, IEEE Transactions on Multimedia.

[10]  Shaogang Gong,et al.  Video Behavior Profiling for Anomaly Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Gerhard Rigoll,et al.  A Multi-Modal Mixed-State Dynamic Bayesian Network for Robust Meeting Event Recognition from Disturbed Data , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[12]  Jake K. Aggarwal,et al.  A hierarchical Bayesian network for event recognition of human actions and interactions , 2004, Multimedia Systems.

[13]  Jean Ponce,et al.  Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[15]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[16]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Nir Friedman,et al.  Sequential Update of Bayesian Network Structure , 1997, UAI.

[18]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[19]  David A. Forsyth,et al.  Learning the Behavior of Users in a Public Space through Video Tracking , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[20]  Jake K. Aggarwal,et al.  Hierarchical Recognition of Human Activities Interacting with Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Thomas C. Henderson,et al.  Video-based Animal Behavior Analysis From Multiple Cameras , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[22]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[23]  Chun-Liang Tung,et al.  Dynamic hand gesture recognition using hierarchical dynamic Bayesian networks through low-level image processing , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[24]  William D. Penny,et al.  Bayesian Approaches to Gaussian Mixture Modeling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Tieniu Tan,et al.  Fusion of static and dynamic body biometrics for gait recognition , 2004, IEEE Trans. Circuits Syst. Video Technol..

[26]  Shaogang Gong,et al.  Video behaviour profiling and abnormality detection without manual labelling , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[27]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[28]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, CVPR 2004.

[29]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yi-Nung Chung,et al.  Human home behavior interpretation from video streams , 2004, IEEE International Conference on Networking, Sensing and Control, 2004.

[31]  G. Schwarz Estimating the Dimension of a Model , 1978 .