Describing Trajectory of Surface Patch for Human Action Recognition on RGB and Depth Videos

This letter proposes a new feature describing the trajectories of surface patches (ToSP) on human bodies for action recognition by a novel scheme of utilizing RGB and depth videos. RGB data contains appearance information by which we track specific patches on body surfaces while depth data contains spatial information by which we describe surface patches. Specifically, we use spatial-temporal interest points as initial points to track in two directions. A ToSP is extracted by keeping the neighborhood in point cloud of each point on the trajectory. By using the temporal pyramid, ToSPs are matched on several levels based on the surface feature extracted from ToSP segments. The proposed feature captures both the shape and position variations of surface patches, thus it has the advantages of trajectories and local spatial-temporal features. The experiment results show that the proposed feature outperforms the existing trajectories based features and depth features.

[1]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[3]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[4]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[5]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[8]  Wei Pang,et al.  Tensor Discriminant Analysis With Multiscale Features for Action Modeling and Categorization , 2012, IEEE Signal Processing Letters.

[9]  Shuicheng Yan,et al.  Body Surface Context: A New Robust Feature for Action Recognition From Depth Videos , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[11]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Juho Kannala,et al.  Joint Depth and Color Camera Calibration with Distortion Correction , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[15]  Seung-Won Jung,et al.  Image Contrast Enhancement Using Color and Depth Histograms , 2014, IEEE Signal Processing Letters.

[16]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[17]  Douglas Summers-Stay,et al.  Using a minimal action grammar for activity understanding in the real world , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Tae-Seong Kim,et al.  Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home , 2012, IEEE Transactions on Consumer Electronics.

[19]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[20]  Lynne E. Parker,et al.  4-dimensional local spatio-temporal features for human activity recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[22]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[23]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.