An Approach to Pose-Based Action Recognition

We address action recognition in videos by modeling the spatial-temporal structures of human poses. We start by improving a state of the art method for estimating human joint locations from videos. More precisely, we obtain the K-best estimations output by the existing method and incorporate additional segmentation cues and temporal constraints to select the ``best'' one. Then we group the estimated joints into five body parts (e.g. the left arm) and apply data mining techniques to obtain a representation for the spatial-temporal structures of human actions. This representation captures the spatial configurations of body parts in one frame (by spatial-part-sets) as well as the body part movements(by temporal-part-sets) which are characteristic of human actions. It is interpretable, compact, and also robust to errors on joint estimations. Experimental results first show that our approach is able to localize body joints more accurately than existing methods. Next we show that it outperforms state of the art action recognizers on the UCF sport, the Keck Gesture and the MSR-Action3D datasets.

[1]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[2]  Pawan Sinha,et al.  Top-down influences on stereoscopic depth-perception , 1998, Nature Neuroscience.

[3]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[4]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[5]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[9]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Pinar Duygulu Sahin,et al.  Human Action Recognition Using Distribution of Oriented Rectangular Patches , 2007, Workshop on Human Motion.

[11]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Andrew Gilbert,et al.  Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-temporal Corners , 2008, ECCV.

[13]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[16]  Subhransu Maji,et al.  Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[17]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[20]  Wen Gao,et al.  Instantly telling what happens in a video sequence using simple features , 2011, CVPR 2011.

[21]  Mubarak Shah,et al.  Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[22]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[23]  Ioannis A. Kakadiaris,et al.  Modeling Motion of Body Parts for Action Recognition , 2011, BMVC.

[24]  Larry S. Davis,et al.  Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Ran Xu,et al.  Combining Skeletal Pose with Local Motion for Human Activity Recognition , 2012, AMDO.

[27]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Josephine Sullivan,et al.  Using Richer Models for Articulated Pose Estimation of Footballers , 2012, BMVC.