On the improvement of human action recognition from depth map sequences using Space-Time Occupancy Patterns

We present a new visual representation for 3D action recognition from sequences of depth maps. In this new representation, space and time axes are divided into multiple segments to define a 4D grid for each depth map sequences. Each cell in the grid is associated with an occupancy value which is a function of the number of space-time points falling into this cell. The occupancy values of all the cells form a high dimensional feature vector, called Space-Time Occupancy Pattern (STOP). We then perform dimensionality reduction to obtain lower-dimensional feature vectors. The advantage of STOP is that it preserves spatial and temporal contextual information between space and time cells while being flexible enough to accommodate intra-action variations. Furthermore, we combine depth maps with skeletons in order to obtain view invariance and present an automatic segmentation and time alignment method for on-line recognition of depth sequences. Our visual representation is validated with experiments on a public 3D human action dataset.

[1]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Adrian Hilton,et al.  Video-rate capture of dynamic face shape and appearance , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[3]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[4]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[5]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[6]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[7]  Junxia Gu,et al.  Action and Gait Recognition From Recovered 3-D Human Joints , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Wei Liang,et al.  Discriminative human action recognition in the learned hierarchical manifold space , 2010, Image Vis. Comput..

[9]  Mario Fernando Montenegro Campos,et al.  Real-Time Gesture Recognition from Depth Data through Key Poses Learning and Decision Forests , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images.

[10]  Xia Liu,et al.  Sign recognition using depth image streams , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[11]  Zicheng Liu,et al.  Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[13]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Stefan Müller,et al.  Hand Gesture Recognition with a Novel IR Time-of-Flight Range Camera-A Pilot Study , 2007, MIRAGE.

[17]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[19]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[20]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[21]  Mario Fernando Montenegro Campos,et al.  Distance matrices as invariant features for classifying MoCap data , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[22]  Xia Liu,et al.  Hand gesture recognition using depth data , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[23]  Darko Kirovski,et al.  Real-time classification of dance gestures from skeleton animation , 2011, SCA '11.

[24]  Mohan M. Trivedi,et al.  3D Shape Context Based Gesture Analysis Integrated with Tracking using Omni Video Array , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[25]  Mario Fernando Montenegro Campos,et al.  Sparse Spatial Coding: A novel approach for efficient and accurate object recognition , 2012, 2012 IEEE International Conference on Robotics and Automation.

[26]  Nikolaos Grammalidis,et al.  A face and gesture recognition system based on an active stereo sensor , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).