A hierarchical Bayesian network for event recognition of human actions and interactions

Abstract.Recognizing human interactions is a challenging task due to the multiple body parts of interacting persons and the concomitant occlusions. This paper presents a method for the recognition of two-person interactions using a hierarchical Bayesian network (BN). The poses of simultaneously tracked body parts are estimated at the low level of the BN, and the overall body pose is estimated at the high level of the BN. The evolution of the poses of the multiple body parts are processed by a dynamic Bayesian network (DBN). The recognition of two-person interactions is expressed in terms of semantic verbal descriptions at multiple levels: individual body-part motions at low level, single-person actions at middle level, and two-person interactions at high level. Example sequences of interacting persons illustrate the success of the proposed framework.

[1]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[2]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[3]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[5]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[6]  Graeme A. Jones,et al.  Video surveillance tracking using colour region adjacency graphs , 1999 .

[7]  Joseph O'Rourke,et al.  Computational Geometry in C. , 1995 .

[8]  Larry S. Davis,et al.  Probabilistic framework for segmenting people under occlusion , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  J. K. Aggarwal,et al.  Tracking and recognizing two-person interactions in outdoor image sequences , 2001, Proceedings 2001 IEEE Workshop on Multi-Object Tracking.

[10]  Shaogang Gong,et al.  Resolving Visual Uncertainty and Occlusion through Probabilistic Reasoning , 2000, BMVC.

[11]  Takashi Matsuyama,et al.  Multiobject Behavior Recognition by Event Driven Selective Attention Method , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Sangho Park,et al.  Segmentation and tracking of interacting human body parts under occlusion and shadowing , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[13]  Jake K. Aggarwal,et al.  Recognition of human interaction using multiple features in gray scale images , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[14]  Mubarak Shah,et al.  Person-on-person violence detection in video data , 2002, Object recognition supported by user interaction for service robots.

[15]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[16]  Frank Jensen,et al.  Optimal junction Trees , 1994, UAI.

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  Patrick Bouthemy,et al.  Real-Time Tracking of Moving Persons by Exploiting Spatio-Temporal Image Slices , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Ramakant Nevatia,et al.  Representation and optimal recognition of human activities , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  I. Kakadiaris,et al.  A convex penalty method for optical human motion tracking , 2003, IWVS '03.

[21]  Shaogang Gong,et al.  Tracking Discontinuous Motion Using Bayesian Inference , 2000, ECCV.

[22]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[23]  Sangho Park,et al.  Recognition of two-person interactions using a hierarchical Bayesian network , 2003, IWVS '03.

[24]  Jake K. Aggarwal,et al.  Video Retrieval of Human Interactions Using Model-Based Motion Tracking and Multi-layer Finite State Automata , 2003, CIVR.

[25]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  James F. Allen,et al.  Actions and Events in Interval Temporal Logic , 1994 .

[27]  Kunio Fukunaga,et al.  Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions , 2002, International Journal of Computer Vision.

[28]  Ronald L. Graham,et al.  An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set , 1972, Inf. Process. Lett..

[29]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Stephen J. Maybank,et al.  Real-Time Tracking of Pedestrians and Vehicles , 2001 .