Human action recognition based on estimated weak poses

We present a novel method for human action recognition (HAR) based on estimated poses from image sequences. We use 3D human pose data as additional information and propose a compact human pose representation, called a weak pose, in a low-dimensional space while still keeping the most discriminative information for a given pose. With predicted poses from image features, we map the problem from image feature space to pose space, where a Bag of Poses (BOP) model is learned for the final goal of HAR. The BOP model is a modified version of the classical bag of words pipeline by building the vocabulary based on the most representative weak poses for a given action. Compared with the standard k-means clustering, our vocabulary selection criteria is proven to be more efficient and robust against the inherent challenges of action recognition. Moreover, since for action recognition the ordering of the poses is discriminative, the BOP model incorporates temporal information: in essence, groups of consecutive poses are considered together when computing the vocabulary and assignment. We tested our method on two well-known datasets: HumanEva and IXMAS, to demonstrate that weak poses aid to improve action recognition accuracies. The proposed method is scene-independent and is comparable with the state-of-art method.

[1]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[2]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[4]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[8]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[9]  Stefano Soatto,et al.  Flexible Dictionaries for Action Classification , 2008 .

[10]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[11]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[13]  J. Aggarwal,et al.  Recognizing human action from a far field of view , 2009, 2009 Workshop on Motion and Video Computing (WMVC).

[14]  Gordon Lightbody,et al.  Gaussian process approach for modelling of nonlinear systems , 2009, Eng. Appl. Artif. Intell..

[15]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Larry S. Davis,et al.  Action recognition using ballistic dynamics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yihong Gong,et al.  Latent Pose Estimator for Continuous Action Recognition , 2008, ECCV.

[19]  F. Xavier Roca,et al.  Automatic Key Pose Selection for 3D Human Action Recognition , 2010, AMDO.

[20]  Sethu Vijayakumar,et al.  Multi-task Gaussian Process Learning of Robot Inverse Dynamics , 2008, NIPS.

[21]  Rémi Ronfard,et al.  Motion History Volumes for Free Viewpoint Action Recognition , 2005 .

[22]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Junxia Gu,et al.  Action and Gait Recognition From Recovered 3-D Human Joints , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Richard Souvenir,et al.  Viewpoint Manifolds for Action Recognition , 2009, EURASIP J. Image Video Process..

[28]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[30]  Amine Bermak,et al.  Gaussian process for nonstationary time series prediction , 2004, Comput. Stat. Data Anal..

[31]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[32]  Vladimir M. Zatsiorsky Kinematics of human motion , 1998 .

[33]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  P. Siva,et al.  Action Detection in Crowd , 2010, BMVC.

[35]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[37]  Michael R. Lyu,et al.  Nonrigid shape recovery by Gaussian process regression , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Radu Horaud,et al.  An Unsupervised Framework for Action Recognition Using Actemes , 2010, ACCV.

[39]  Ramakant Nevatia,et al.  3D Human Action Recognition Using Spatio-temporal Motion Templates , 2005, ICCV-HCI.

[40]  F. Xavier Roca,et al.  Action-specific motion prior for efficient Bayesian 3D human body tracking , 2009, Pattern Recognit..

[41]  Jordi Gonzàlez,et al.  Background subtraction technique based on chromaticity and intensity patterns , 2008, 2008 19th International Conference on Pattern Recognition.

[42]  F. Xavier Roca,et al.  Understanding dynamic scenes based on human sequence evaluation , 2009, Image Vis. Comput..

[43]  Jordi Gonzàlez,et al.  Accurate Moving Cast Shadow Suppression Based on Local Color Constancy Detection , 2011, IEEE Transactions on Image Processing.

[44]  Gerhard Rigoll,et al.  Action Recognition in Meeting Scenarios using Global Motion Features , 2003 .

[45]  M WangJack,et al.  Gaussian Process Dynamical Models for Human Motion , 2008 .

[46]  AgarwalAnkur,et al.  Recovering 3D Human Pose from Monocular Images , 2006 .

[47]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[48]  Vladimir M. Zatsiorsky,et al.  Kinetics of Human Motion , 2002 .

[49]  David J. Fleet,et al.  Correction to "Gaussian Process Dynamical Models for Human Motion" , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Mannes Poel,et al.  Comparison of silhouette shape descriptors for example-based human pose recovery , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[51]  Pietro Perona,et al.  Human action recognition by sequence of movelet codewords , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[52]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.