Image-based human pose estimation

Human pose estimation has become an active research topic in the field of computer vision. However, there are still some technical challenges because of the complexity of human motion. Although the depth sensors, such as Kinect and Xtion, open up new possibilities of handling with issues, they present some new challenges. In this thesis, we only address human pose estimation frameworks based on colour image and explore the possibility of the tradeoff between effective representing features and models. Firstly, the task of human pose estimation can be treated as a regression model. So we propose a novel method based on the regression model, which is designed for estimating the upper joints and recognizing their special motions. We verified the proposed method on our recorded dataset and the experimental results show the proposed method is effective. This provides an important clue that the performance of human joints estimation contributes significantly for human motion estimation. Secondly, the computation problems are always making it difficult for computer vision. For example, the pictorial structures normally use the interactions between connected joints such as elbow and shoulder, leading to a quadratic computation cost in the number of pixels for the inference process. Then a simple model for restricting themselves is proposed, which only measure the quality of limb-pair possibilities. Meanwhile, it allows the efficient inference in richer models, which exploit the data-dependent interactions. Thirdly, to improve the effectiveness of the body pose estimation, we introduce a object tracking method to the body pose estimation process. In addition, we introduce structured prediction aggregate model, which only need to focus on necessary computational effort. It can ensure the accurate output by filtering out many states cheaply. Meanwhile, our proposed decomposition method use cyclic dependencies on a tree model when imposing the model agreement. Thus it allows for efficient inference on a video or an image. To sum up, we evaluate our proposed methods on public datasets and compare them with some popular methods to demonstrate both the efficiency and effectiveness. The model pairwise interaction potentials are afforded with data-dependent features and the aggregate model. The experimental results show that our model is worthwhile and features used are accurate for pose estimation on popular datasets.

[1]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[2]  Gang Yu,et al.  Propagative Hough Voting for Human Activity Detection and Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Ilya Kostrikov,et al.  An Efficient Convolutional Network for Human Pose Estimation , 2016, BMVC.

[4]  Luc Van Gool,et al.  Class-specific 3D localization using constellations of object parts , 2011, BMVC.

[5]  Honghai Liu,et al.  Real time object tracking via a mixture model , 2015, 2015 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[6]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[8]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[10]  S. M. Steve SUSAN - a new approach to low level image processing , 1997 .

[11]  Feng Shi,et al.  Sampling Strategies for Real-Time Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[13]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[14]  Robert Bergevin,et al.  Semantic human activity recognition: A literature review , 2015, Pattern Recognit..

[15]  Ben Taskar,et al.  Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[16]  Richard Bowden,et al.  Hollywood 3D: Recognizing Actions in 3D Natural Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Afzal Godil,et al.  Evaluation of 3D Interest Point Detection Techniques , 2011, 3DOR@Eurographics.

[18]  Charless C. Fowlkes,et al.  Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[19]  Junzhou Huang,et al.  Robust tracking using local sparse appearance model and K-selection , 2011, CVPR 2011.

[20]  Cristian Sminchisescu,et al.  Training Deformable Models for Localization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[24]  Shih-Fu Chang,et al.  Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Huchuan Lu,et al.  Incremental MPCA for Color Object Tracking , 2010, 2010 20th International Conference on Pattern Recognition.

[26]  Richard Bowden,et al.  Real-Time Upper Body Detection and 3D Pose Estimation in Monoscopic Images , 2006, ECCV.

[27]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[29]  Urbano Nunes,et al.  Probabilistic Social Behavior Analysis by Exploring Body Motion-Based Patterns , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Tae-Kyun Kim,et al.  Real-time Action Recognition by Spatiotemporal Semantic and Structural Forests , 2010, BMVC.

[31]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[32]  Cristian Sminchisescu,et al.  Latent structured models for human pose estimation , 2011, 2011 International Conference on Computer Vision.

[33]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[34]  Philip H. S. Torr,et al.  Fast Human Pose Detection Using Randomized Hierarchical Cascades of Rejectors , 2012, International Journal of Computer Vision.

[35]  Andrew W. Fitzgibbon,et al.  Semi-supervised Learning of Joint Density Models for Human Pose Estimation , 2006, BMVC.

[36]  Norbert Krüger,et al.  Face Recognition by Elastic Bunch Graph Matching , 1997, CAIP.

[37]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[38]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[39]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[41]  Xiaodong Yang,et al.  Super Normal Vector for Human Activity Recognition with Depth Cameras , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Vittorio Ferrari,et al.  We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[43]  Zicheng Liu,et al.  Tensor-Based Human Body Modeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Luc Van Gool,et al.  Exemplar-based Action Recognition in Video , 2009, BMVC.

[45]  R Kikinis,et al.  Detection of point landmarks in multidimensional tensor data , 2001, Signal Process..

[46]  Juan Carlos Niebles,et al.  Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos , 2017, Image Vis. Comput..

[47]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[48]  Antonis A. Argyros,et al.  A Generative Approach to Tracking Hands and Their Interaction with Objects , 2015, ICMMI.

[49]  Gang Hua,et al.  A decentralized probabilistic approach to articulated body tracking , 2007, Comput. Vis. Image Underst..

[50]  Ronen Basri,et al.  Constructing implicit 3D shape models for pose estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[51]  Long Zhu,et al.  Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion , 2008, ECCV.

[52]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Martial Hebert,et al.  Multi-scale interest regions from unorganized point clouds , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[54]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[58]  Nanning Zheng,et al.  Modeling 4D Human-Object Interactions for Event and Object Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[59]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[61]  David A. Forsyth,et al.  Finding people by sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[62]  Xiaogang Wang,et al.  Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[64]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[65]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[67]  Horst Bischof,et al.  3D Segmentation by Maximally Stable Volumes (MSVs) , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[68]  Silvio Savarese,et al.  An efficient branch-and-bound algorithm for optimal human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[70]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[71]  Wei-Yun Yau,et al.  Human Action Recognition With Video Data: Research and Evaluation Challenges , 2014, IEEE Transactions on Human-Machine Systems.

[72]  Hammam A. Alshazly,et al.  Image Features Detection, Description and Matching , 2016 .

[73]  Hayko Riemenschneider,et al.  Bag of Optical Flow Volumes for Image Sequence Recognition , 2009, BMVC.

[74]  Antonio Criminisi,et al.  Regression Forests for Efficient Anatomy Detection and Localization in CT Studies , 2010, MCV.

[75]  Patrick Pérez,et al.  Joint pose estimation and action recognition in image graphs , 2011, 2011 18th IEEE International Conference on Image Processing.

[76]  Feng Li,et al.  Blurred target tracking by Blur-driven Tracker , 2011, 2011 International Conference on Computer Vision.

[77]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[78]  Simon Lucey,et al.  Convolutional Sparse Coding for Trajectory Reconstruction , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Jinxiang Chai,et al.  Modeling 3D human poses from uncalibrated monocular images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[80]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[81]  Slav Petrov,et al.  Coarse-to-Fine Natural Language Processing , 2011, Theory and Applications of Natural Language Processing.

[82]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[83]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[84]  George K. I. Mann,et al.  A Single-Object Tracking Method for Robots using Object-Based Visual Attention , 2012, Int. J. Humanoid Robotics.

[85]  Ioannis A. Kakadiaris,et al.  A Review of Human Activity Recognition Methods , 2015, Front. Robot. AI.

[86]  Cordelia Schmid,et al.  Mixing Body-Part Sequences for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[87]  Dieter Fox,et al.  Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation , 2010, Int. J. Robotics Res..

[88]  Hao Jiang,et al.  Human pose estimation using consistent max-covering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[89]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[90]  Arati Dandavate,et al.  Semantic Texton Forests for Image Categorization and Segmentation , 2018, IJARCCE.

[91]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[92]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[93]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Luc Van Gool,et al.  Coupled Action Recognition and Pose Estimation from Multiple Views , 2012, International Journal of Computer Vision.

[95]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[96]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[98]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[99]  Alexei A. Efros,et al.  Scene Semantics from Long-Term Observation of People , 2012, ECCV.

[100]  Maria Pateraki,et al.  Full-Body Pose Tracking—The Top View Reprojection Approach , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[102]  Gwenn Englebienne,et al.  Learning to Recognize Human Activities Using Soft Labels. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[103]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[104]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors Based on 3D Objects , 2005, ICCV.

[105]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[106]  Yanning Zhang,et al.  Part-Based Visual Tracking with Online Latent Structural Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[107]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[108]  Björn Stenger,et al.  A new distance for scale-invariant 3D shape recognition and registration , 2011, 2011 International Conference on Computer Vision.

[109]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[110]  Horst Bischof,et al.  Localization of 3D Anatomical Structures Using Random Forests and Discrete Optimization , 2010, MCV.

[111]  Björn Stenger,et al.  Demisting the Hough Transform for 3D Shape Recognition and Registration , 2014, International Journal of Computer Vision.

[112]  Xavier Carreras,et al.  TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-Rich Parsing , 2008, CoNLL.

[113]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[114]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[115]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[116]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[117]  Filip Jurcícek,et al.  Comparison of Bayesian Discriminative and Generative Models for Dialogue State Tracking , 2013, SIGDIAL Conference.

[118]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[119]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[120]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[121]  Horst Bischof,et al.  Segmentation-based tracking by support fusion , 2013, Comput. Vis. Image Underst..

[122]  Junseok Kwon,et al.  Visual tracking decomposition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[123]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[124]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[125]  Xiaogang Wang,et al.  Structured Feature Learning for Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[126]  Erik G. Learned-Miller,et al.  Data driven image models through continuous joint alignment , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[127]  Lei Zhang,et al.  Fast Compressive Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[128]  Honghai Liu,et al.  Activity recognition for asd children based on joints estimation , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[129]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[130]  Ben Taskar,et al.  Structured Determinantal Point Processes , 2010, NIPS.

[131]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[132]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[133]  Andrew Blake,et al.  Multiscale Categorical Object Recognition Using Contour Fragments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[134]  Ioannis Patras,et al.  The fast-3D spatio-temporal interest region detector , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[135]  L. Mathew,et al.  Increasing trend of wearables and multimodal interface for human activity monitoring: A review. , 2017, Biosensors & bioelectronics.

[136]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[137]  Luc Van Gool,et al.  2D Action Recognition Serves 3D Human Pose Estimation , 2010, ECCV.

[138]  R. Horaud,et al.  Surface feature detection and description with applications to mesh matching , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[139]  Przemyslaw Glomb,et al.  Detection of Interest Points on 3D Data: Extending the Harris Operator , 2009, Computer Recognition Systems 3.

[140]  Najla Megherbi Bouallagu,et al.  Object Recognition using 3D SIFT in Complex CT Volumes , 2010, BMVC.

[141]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[142]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[143]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[144]  Antonis A. Argyros,et al.  Tracking the articulated motion of the human body with two RGBD cameras , 2014, Machine Vision and Applications.

[145]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[146]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[147]  Shyamsundar Rajaram,et al.  Human Activity Recognition Using Multidimensional Indexing , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[148]  Leonid Sigal,et al.  Human Context: Modeling Human-Human Interactions for Monocular 3D Pose Estimation , 2012, AMDO.

[149]  Federico Tombari,et al.  Performance Evaluation of 3D Keypoint Detectors , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[150]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[151]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[152]  Reinhard Klein,et al.  Correspondences between Salient Points on 3D Shapes , 2006 .

[153]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[154]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[155]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[156]  Andrew Zisserman,et al.  Pose search: Retrieving people using their pose , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[157]  Benjamin Bustos,et al.  Harris 3D: a robust extension of the Harris operator for interest point detection on 3D meshes , 2011, The Visual Computer.

[158]  Rafeef Abugharbieh,et al.  3D ultrasound volume stitching using phase symmetry and harris corner detection for orthopaedic applications , 2010, Medical Imaging.

[159]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[160]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[161]  Yang Wang,et al.  Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[162]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[163]  Hans-Peter Seidel,et al.  Outdoor human motion capture using inverse kinematics and von mises-fisher sampling , 2011, 2011 International Conference on Computer Vision.

[164]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[165]  Honghai Liu,et al.  Towards Hand-Object Gesture Extraction from Depth Image , 2016, 2016 Joint 8th International Conference on Soft Computing and Intelligent Systems (SCIS) and 17th International Symposium on Advanced Intelligent Systems (ISIS).

[166]  Pietro Perona,et al.  Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition , 2007, International Journal of Computer Vision.

[167]  Long Chen,et al.  Fast Fashion Guided Clothing Image Retrieval: Delving Deeper into What Feature Makes Fashion , 2016, ACCV.

[168]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[169]  Francesc Moreno-Noguer,et al.  Single image 3D human pose estimation from noisy observations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[170]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[171]  Hong Wei,et al.  A survey of human motion analysis using depth imagery , 2013, Pattern Recognit. Lett..