4-dimensional local spatio-temporal features for human activity recognition

Recognizing human activities from common color image sequences faces many challenges, such as complex backgrounds, camera motion, and illumination changes. In this paper, we propose a new 4-dimensional (4D) local spatio-temporal feature that combines both intensity and depth information. The feature detector applies separate filters along the 3D spatial dimensions and the 1D temporal dimension to detect a feature point. The feature descriptor then computes and concatenates the intensity and depth gradients within a 4D hyper cuboid, which is centered at the detected feature point, as a feature. For recognizing human activities, Latent Dirichlet Allocation with Gibbs sampling is used as the classifier. Experiments are performed on a newly created database that contains six human activities, each with 33 samples with complex variations. Experimental results demonstrate the promising performance of the proposed features for the task of human activity recognition.

[1]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[3]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[4]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[5]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Rüdiger Dillmann,et al.  Sensor fusion for 3D human body tracking with an articulated 3D body model , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[9]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Oliver Brdiczka,et al.  Detecting Human Behavior Models From Multimodal Observation in a Smart Home , 2009, IEEE Transactions on Automation Science and Engineering.

[12]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[13]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[15]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[16]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.