Integrating multi-stage depth-induced contextual information for human action recognition and localization

Human action recognition and localization is a challenging vision task with promising applications. To tackle this problem, recently developed commodity depth sensor (e.g., Microsoft Kinect) has opened up new opportunities with several developed human motion features based on depth image for action representation. However, how depth information can be effectively adopted in the middle or high level representation in action detection, in particular, the depth induced three dimensional contextual information for modeling interactions between human-human, human-object and human-surroundings has yet been explored. In this paper, we propose a novel action recognition and localization framework which effectively fuses depth-induced contextual information from different levels of the processing pipeline for understanding various interactions. First, depth image is combined with grayscale image for more robust human subject and object detection. Second, three dimensional spatial and temporal relationship among human subjects or objects is represented based on the combination of grayscale and depth images. Third, depth information is further utilized to represent different types of indoor scenes. Finally, we fuse these multiple stage depth-induced contextual information to yield an unified action detection framework. Extensive experiments on a challenging grayscale + depth human action detection benchmark database demonstrate the effectiveness of the depth-induced contextual information and the high detection accuracy of the proposed framework.

[1]  Bingbing Ni,et al.  Recognizing human group activities with localized causalities , 2009, CVPR 2009.

[2]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[6]  Ze-Nian Li BEYOND ACTIONS : DISCRIMINATIVE MODELS FOR CONTEXTUAL GROUP ACTIVITIES , 2010 .

[7]  Janusz Konrad,et al.  Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels , 2010, ICPR Contests.

[8]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[9]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[10]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Christophe Garcia,et al.  Human activities dataset and the ICPR 2012 human activities recognition and localization competition , 2012 .

[16]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[17]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[18]  Ying Wu,et al.  Discriminative Video Pattern Search for Efficient Action Detection , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.