Human pose estimation with implicit shape models

We address the problem of articulated 2D human pose estimation in natural images. A well-known person detector -- the Implicit Shape Model (ISM) approach introduced by Leibe et al. -- is shown not only to be well suited to detect persons, but can also be exploited to derive a person's pose. Therefore, we extend the original voting approach of ISM and let all visual words that contribute to a person hypothesis also vote for the positions of the person's body parts. Since this approach is not constrained to a certain feature type and different feature types can even be fused during the pose estimation process, the approach is highly flexible. We show preliminary evaluation results of our approach using on the public available HumanEva dataset which comprises ground-truth pose data and thereby provides training and evaluation data.

[1]  Michael J. Black,et al.  Predicting 3D People from 2D Pictures , 2006, AMDO.

[2]  Michael Arens,et al.  On the effect of temporal information on monocular 3d human pose estimation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[3]  Pedram Azad Visual Perception for Manipulation and Imitation in Humanoid Robots , 2008, Cognitive Systems Monographs.

[4]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  M. Tarr,et al.  Visual Object Recognition , 1996, ISTCS.

[7]  Kai Jüngling,et al.  Ein generisches System zur automatischen Detektion, Verfolgung und Wiedererkennung von Personen in Videodaten , 2011 .

[8]  Neill W. Campbell,et al.  Monocular 3D human pose estimation using sparse motion features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[9]  Honghai Liu,et al.  Advances in View-Invariant Human Motion Analysis: A Review , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Raveendran Paramesran,et al.  Single camera 3D human pose estimation: A Review of current techniques , 2009, 2009 International Conference for Technical Postgraduates (TECHPOS).

[11]  Michael Arens,et al.  Feature based person detection beyond the visible spectrum , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Takeo Kanade,et al.  Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[14]  Michael J. Black,et al.  From Pictorial Structures to deformable structures , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Shimon Ullman,et al.  The chains model for detecting parts by their context , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Roberto Cipolla,et al.  Real-time tracking of highly articulated structures in the presence of noisy measurements , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  Time Invariant Action Recognition with 3 D Pose Information Based on the Generalized Hough Transformation , .

[18]  Ankur Agarwal,et al.  A Local Basis Representation for Estimating Human Pose from Cluttered Images , 2006, ACCV.

[19]  John J. Craig,et al.  Introduction to Robotics Mechanics and Control , 1986 .

[20]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[21]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[22]  Rama Chellappa,et al.  View independent human body pose estimation from a single perspective image , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[23]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  B. Schiele,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[26]  Larry S. Davis,et al.  Context and observation driven latent variable model for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Mohammed Bennamoun,et al.  Context-Based Appearance Descriptor for 3D Human Pose Estimation from Monocular Images , 2009, 2009 Digital Image Computing: Techniques and Applications.

[28]  Hao Jiang 3D Human Pose Reconstruction Using Millions of Exemplars , 2010, 2010 20th International Conference on Pattern Recognition.

[29]  Ramakant Nevatia,et al.  Multiple pose context trees for estimating human pose in object context , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[30]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Michael Arens,et al.  Reconstructing The Missing Dimension: From 2D To 3D Human Pose Estimation , 2011 .

[32]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[34]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[35]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[36]  Meng Li,et al.  Silhouette-Based 2D Human Pose Estimation , 2009, 2009 Fifth International Conference on Image and Graphics.

[37]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[38]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[39]  David Suter,et al.  Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations , 2006, BMVC.

[40]  J. Kennedy,et al.  Population structure and particle swarm performance , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[41]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, ACM Trans. Graph..

[42]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[44]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[46]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[47]  Luc Van Gool,et al.  Fast PRISM: Branch and Bound Hough Transform for Object Class Detection , 2011, International Journal of Computer Vision.

[48]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[49]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[50]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[51]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[52]  Cristian Sminchisescu,et al.  Human Pose Estimation from Silhouettes - A Consistent Approach Using Distance Level Sets , 2002, WSCG.

[53]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[54]  Leonid Sigal,et al.  Human Context: Modeling Human-Human Interactions for Monocular 3D Pose Estimation , 2012, AMDO.

[55]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[57]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[58]  Bernt Schiele,et al.  Towards Robust Pedestrian Detection in Crowded Image Sequences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[60]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[61]  Luc Van Gool,et al.  PRISM: PRincipled Implicit Shape Model , 2009, BMVC.

[62]  Jessica K. Hodgins,et al.  Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database , 2008 .

[63]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[64]  Sergey Levine,et al.  Continuous character control with low-dimensional embeddings , 2012, ACM Trans. Graph..

[65]  Andrew Zisserman,et al.  2D Human Pose Estimation in TV Shows , 2009, Statistical and Geometrical Approaches to Visual Motion Analysis.

[66]  Luc Van Gool,et al.  Using Recognition to Guide a Robot's Attention , 2008, Robotics: Science and Systems.

[67]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[68]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[69]  Luc Van Gool,et al.  Coupled Action Recognition and Pose Estimation from Multiple Views , 2012, International Journal of Computer Vision.

[70]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[71]  Bernt Schiele,et al.  Multi-Aspect Detection of Articulated Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[72]  Hans-Peter Seidel,et al.  Outdoor human motion capture using inverse kinematics and von mises-fisher sampling , 2011, 2011 International Conference on Computer Vision.

[73]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[74]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[75]  Tobias Feldmann Multikamerabasierte Poseschätzung von Menschen , 2012 .

[76]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[77]  Fei-Fei Li,et al.  Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[78]  R. C. Veltkamp,et al.  Utrecht Multi-Person Motion ( UMPM ) benchmark , 2011 .

[79]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Human Pose Estimation , 2007, MLMI.

[80]  Yong Liu,et al.  Latent Gaussian Mixture Regression for Human Pose Estimation , 2010, ACCV.

[81]  Richard O. Duda,et al.  Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[82]  Mark Everingham,et al.  Learning shape models for monocular human pose estimation from the Microsoft Xbox Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[83]  Igor Mordatch,et al.  Spatial Pose Trees: Creating and Editing Motions Using a Hierarchy of Low Dimensional Control Spaces , 2006 .

[84]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[85]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[86]  Dariu Gavrila,et al.  Multi-view 3D Human Pose Estimation in Complex Environment , 2011, International Journal of Computer Vision.

[87]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[88]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[89]  Vijay John,et al.  Markerless human articulated tracking using hierarchical particle swarm optimisation , 2010, Image Vis. Comput..

[90]  Michael Arens,et al.  Detection and tracking of objects with direct integration of perception and expectation , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[91]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[92]  Bastian Leibe,et al.  Efficient Use of Geometric Constraints for Sliding-Window Object Detection in Video , 2011, ICVS.

[93]  Tanja Schultz,et al.  On-line Action Recognition from Sparse Feature Flow , 2012, VISAPP.

[94]  Dorin Comaniciu,et al.  The Variable Bandwidth Mean Shift and Data-Driven Scale Selection , 2001, ICCV.

[95]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[96]  Wolfgang Hübner,et al.  Generative 2D and 3D Human Pose Estimation with Vote Distributions , 2012, ISVC.

[97]  James Kennedy,et al.  Defining a Standard for Particle Swarm Optimization , 2007, 2007 IEEE Swarm Intelligence Symposium.

[98]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[99]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[100]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[101]  Emanuele Trucco,et al.  Human Body Pose Estimation with Particle Swarm Optimisation , 2008, Evolutionary Computation.

[102]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[103]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[104]  Pietro Perona,et al.  The Fastest Pedestrian Detector in the West , 2010, BMVC.

[105]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[106]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[107]  Jinxiang Chai,et al.  Modeling 3D human poses from uncalibrated monocular images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[108]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[109]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[110]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111]  Bernt Schiele,et al.  Learning semantic object parts for object categorization , 2008, Image Vis. Comput..

[112]  Michael Arens,et al.  Modeling vs. learning approaches for monocular 3D human pose estimation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).