Egocentric articulated pose tracking for action recognition

Many studies on action recognition from the third-person viewpoint have shown that articulated human pose can directly describe human motion and is invariant to view change. However, conventional algorithms that estimate articulated human pose cannot handle ego-centric images because they assume the whole figure appears in the image; only a few parts of the body appear in ego-centric images. In this paper, we propose a novel method to estimate human pose for action recognition from ego-centric RGB-D images. Our method can extract the pose by integrating hand detection, camera pose estimation, and time-series filtering with the constraint of body shape. Experiments show that joint positions are well estimated when the detection error of hands and arms decreases. We demonstrate that the accuracy of action recognition is improved by the feature of skeleton when the action contains unintended view changes.

[1]  Sei Naito,et al.  An Attention-Based Activity Recognition for Egocentric Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Wolfram Burgard,et al.  Real-time 3D visual SLAM with a hand-held camera , 2011 .

[3]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  BlakeAndrew,et al.  Real-time human pose recognition in parts from single depth images , 2013 .

[7]  Luc Van Gool,et al.  Coupled Action Recognition and Pose Estimation from Multiple Views , 2012, International Journal of Computer Vision.

[8]  Mohan S. Kankanhalli,et al.  Action and Interaction Recognition in First-Person Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[11]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[12]  James M. Rehg,et al.  Modeling Actions through State Changes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Deva Ramanan,et al.  3D Hand Pose Detection in Egocentric RGB-D Images , 2014, ECCV Workshops.

[14]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[15]  Walterio W. Mayol-Cuevas,et al.  High level activity recognition using low resolution wearable vision , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.