Efficient Pose-Based Action Recognition

Action recognition from 3d pose data has gained increasing attention since the data is readily available for depth or RGB-D videos. The most successful approaches so far perform an expensive feature selection or mining approach for training. In this work, we introduce an algorithm that is very efficient for training and testing. The main idea is that rich structured data like 3d pose does not require sophisticated feature modeling or learning. Instead, we reduce pose data over time to histograms of relative location, velocity, and their correlations and use partial least squares to learn a compact and discriminative representation from it. Despite of its efficiency, our approach achieves state-of-the-art accuracy on four different benchmarks. We further investigate differences of 2d and 3d pose data for action recognition.

[1]  Stefano Soatto,et al.  Recognition of human gaits , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Ying Wu,et al.  Learning Maximum Margin Temporal Warping for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[5]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[6]  Ioannis A. Kakadiaris,et al.  Modeling Motion of Body Parts for Action Recognition , 2011, BMVC.

[7]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[8]  Christian Bauckhage,et al.  Benefits of Separable, Multilinear Discriminant Classification , 2006, ICPR.

[9]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[10]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[11]  Luc Van Gool,et al.  Coupled Action Recognition and Pose Estimation from Multiple Views , 2012, International Journal of Computer Vision.

[12]  Václav Hlavác,et al.  Pose primitive based human action recognition in videos or still images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[15]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Christian Bauckhage,et al.  Discriminative Joint Non-negative Matrix Factorization for Human Action Classification , 2013, GCPR.

[17]  D. Jacobs,et al.  Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch , 2011, CVPR 2011.

[18]  Mubarak Shah,et al.  Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[19]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[20]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Larry S. Davis,et al.  On partial least squares in head pose estimation: How to simultaneously deal with misalignment , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ming Li,et al.  2D-LDA: A statistical linear discriminant analysis for image matrix , 2005, Pattern Recognit. Lett..

[24]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[25]  Nazli Ikizler-Cinbis,et al.  Learning actions from the Web , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Yasuo Kuniyoshi,et al.  Discriminative spatial pyramid , 2011, CVPR 2011.

[28]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[29]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.

[32]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.