Depth Context: a new descriptor for human activity recognition by using sole depth sequences

Human activity recognition using sole depth information from 3D sensors achieves superior performances to tackle light changes and cluttered backgrounds than using RGB sequences from traditional cameras. However, the noises and occlusions in depth data, which are common problems for 3D sensors, are not well handled. Moreover, many existing methods ignore the strong contextual information from depth data, resulting in limited performances on distinguishing similar activities. To deal with these problems, a local point detector is developed by sampling local points based on both motion and shape clues to represent human activities in depth sequences. Then a novel descriptor named Depth Context is designed for each local point to capture both local and global contextual constrains. Finally, a Bag-of-Visual-Words (BoVW) model is applied to generating human activity representations, which serve as the inputs for a non-linear SVM classifier. State-of-the-art results namely 94.28%, 98.21% and 95.37% are achieved on three public benchmark datasets: MSRAction3D, MSRGesture3D and SKIG, which show the efficiency of proposed method to capture structural depth information. Additional experimental results show that our method is robust to partial occlusions in depth data, and also robust to the changes of pose, illumination and background to some extent.

[1]  Franziska Meier,et al.  3D Shape Context and Distance Transform for action recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[2]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[3]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Michael Harville,et al.  Foreground segmentation using adaptive mixture models in color and depth , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[6]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[7]  Alexandros André Chaaraoui,et al.  A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset , 2014, ArXiv.

[8]  J. Giles Inside the race to hack the Kinect , 2010 .

[9]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  Hairong Qi,et al.  Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[13]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Mineichi Kudo,et al.  Learning action patterns in difference images for efficient action recognition , 2014, Neurocomputing.

[15]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[18]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[19]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[20]  Xiaodong Yang,et al.  Histogram of 3D Facets: A characteristic descriptor for hand gesture recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[21]  Arif Mahmood,et al.  Real time action recognition using histograms of depth gradients and random decision forests , 2014, IEEE Winter Conference on Applications of Computer Vision.

[22]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[24]  Luc Vincent,et al.  Morphological grayscale reconstruction in image analysis: applications and efficient algorithms , 1993, IEEE Trans. Image Process..

[25]  Andreas E. Savakis,et al.  Grassmannian sparse representations , 2015, J. Electronic Imaging.

[26]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[27]  Pol Cirujeda,et al.  4DCov: A Nested Covariance Descriptor of Spatio-Temporal Features for Gesture Recognition in Depth Sequences , 2014, 2014 2nd International Conference on 3D Vision.

[28]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[31]  Arif Mahmood,et al.  HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition , 2014, ECCV.

[32]  Jitendra Malik,et al.  Shape Matching and Object Recognition , 2006, Toward Category-Level Object Recognition.

[33]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[34]  Ling Shao,et al.  Combining appearance and structural features for human action recognition , 2013, Neurocomputing.

[35]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Bogdan Kwolek,et al.  Improving fall detection by the use of depth sensor and accelerometer , 2015, Neurocomputing.

[39]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[40]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[42]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[43]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[44]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[45]  VincentL. Morphological grayscale reconstruction in image analysis , 1993 .

[46]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[48]  James M. Keller,et al.  Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor , 2012, ACCV.

[49]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.