Recognizing Actions from Depth Cameras as Weakly Aligned Multi-part Bag-of-Poses

Recently released depth cameras provide effective estimation of 3D positions of skeletal joints in temporal sequences of depth maps. In this work, we propose an efficient yet effective method to recognize human actions based on the positions of joints. First, the body skeleton is decomposed in a set of kinematic chains, and the position of each joint is expressed in a locally defined reference system which makes the coordinates invariant to body translations and rotations. A multi-part bag-of-poses approach is then defined, which permits the separate alignment of body parts through a nearest-neighbor classification. Experiments conducted on the Florence 3D Action dataset and the MSR Daily Activity dataset show promising results.

[1]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[2]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[4]  Richard Bowden,et al.  Kinecting the dots: Particle based scene flow from depth sensors , 2011, 2011 International Conference on Computer Vision.

[5]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[6]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[7]  Qi Tian,et al.  Human Daily Action Analysis with Multi-view and Color-Depth Data , 2012, ECCV Workshops.

[8]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[11]  Alberto Del Bimbo,et al.  Superfaces: A Super-Resolution Model for 3D Faces , 2012, ECCV Workshops.

[12]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[14]  Alberto Del Bimbo,et al.  Real-time hand status recognition from RGB-D imagery , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.