Human activity recognition in video using a hierarchical probabilistic latent model

In this work, we address the recognition of human activities from a sequence of visual data. To this end, a novel hierarchical probabilistic latent (HPL) model is proposed, which consists of four layers from bottom-up: spatiotemporal visual features layer, atomic pattern layer, latent topic layer, and behavior pattern layer. In this manner, the complicated human activities can be decomposed into low level features, atomic patterns, and latent topics, which are much better suited for the automatic understanding of human behaviors. Given a video sequence, both spatial and temporal interest points are extracted as the low level visual features, which are clustered into distributions of atomic patterns using hierarchical Bayesian networks (HBNs). Then, the proposed hierarchical probabilistic latent model is applied to represent the behavior patterns and latent topics as distributions over atomic patterns. Extensive experimental results based on the KTH dataset have demonstrated the efficiency of the proposed framework.

[1]  Rama Chellappa,et al.  3D Shape-Encoded Particle Filter for Object Tracking and Its Application to Human Body Tracking , 2008, EURASIP J. Image Video Process..

[2]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Maja Pantic,et al.  Human body gesture recognition using adapted auxiliary particle filtering , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[8]  Rama Chellappa,et al.  Activity Representation Using 3D Shape Models , 2008, EURASIP J. Image Video Process..

[9]  Brendan J. Frey,et al.  Video Epitomes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[12]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[13]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[16]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[17]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.