A Survey on Model Based Approaches for 2D and 3D Visual Human Pose Recovery

Human Pose Recovery has been studied in the field of Computer Vision for the last 40 years. Several approaches have been reported, and significant improvements have been obtained in both data representation and model design. However, the problem of Human Pose Recovery in uncontrolled environments is far from being solved. In this paper, we define a general taxonomy to group model based approaches for Human Pose Recovery, which is composed of five main modules: appearance, viewpoint, spatial relations, temporal consistence, and behavior. Subsequently, a methodological comparison is performed following the proposed taxonomy, evaluating current SoA approaches in the aforementioned five group categories. As a result of this comparison, we discuss the main advantages and drawbacks of the reviewed literature.

[1]  Vincent Lepetit,et al.  Closed-Form Solution to Non-rigid 3D Surface Registration , 2008, ECCV.

[2]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Francesc Moreno-Noguer,et al.  Simultaneous pose, correspondence and non-rigid shape , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Yifei Lu,et al.  Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Xavier Perez-Sala Survey on Spatio-Temporal View Invariant Human Pose Recovery , 2012 .

[8]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[10]  Michael Isard,et al.  Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[11]  Thomas B. Moeslund,et al.  Selective spatio-temporal interest points , 2012, Comput. Vis. Image Underst..

[12]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[14]  Larry S. Davis,et al.  Human body pose estimation using silhouette shape analysis , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[15]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[16]  Ramakant Nevatia,et al.  Efficient Inference with Multiple Heterogeneous Part Detectors for Human Pose Estimation , 2010, ECCV.

[17]  A. David Marshall,et al.  A Hierarchical Model of Dynamics for Tracking People with a Single Video Camera , 2000, BMVC.

[18]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  P. Fua,et al.  Observable subspaces for 3D human motion recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Sridha Sridharan,et al.  Fourier Active Appearance Models , 2011, 2011 International Conference on Computer Vision.

[22]  David Gerónimo Gómez,et al.  Computer Vision Approaches to Pedestrian Detection: Visible Spectrum Survey , 2007, IbPRIA.

[23]  Francesc Moreno-Noguer,et al.  Single image 3D human pose estimation from noisy observations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Francesc Moreno-Noguer,et al.  Probabilistic simultaneous pose and non-rigid shape recovery , 2011, CVPR 2011.

[25]  Deva Ramanan,et al.  Part-Based Models for Finding People and Estimating Their Pose , 2011, Visual Analysis of Humans.

[26]  Takeo Kanade,et al.  Nonrigid Structure from Motion in Trajectory Space , 2008, NIPS.

[27]  F. Xavier Roca,et al.  Action-specific motion prior for efficient Bayesian 3D human body tracking , 2009, Pattern Recognit..

[28]  Sergio Escalera Human Behavior Analysis from Depth Maps , 2012, AMDO.

[29]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[30]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[32]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[34]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[36]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[37]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[39]  Yaser Sheikh,et al.  3D Reconstruction of a Moving Point from a Series of 2D Projections , 2010, ECCV.

[40]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[41]  F. Xavier Roca,et al.  Toward Real-Time Pedestrian Detection Based on a Deformable Template Model , 2014, IEEE Transactions on Intelligent Transportation Systems.

[42]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[43]  David J. Fleet,et al.  Temporal motion models for monocular and multiview 3D human body tracking , 2006, Comput. Vis. Image Underst..

[44]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[46]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[47]  F. Xavier Roca,et al.  Semantics of Human Behavior in Image Sequences , 2011, Computer Analysis of Human Behavior.

[48]  Rainer Stiefelhagen,et al.  Head pose estimation using stereo vision for human-robot interaction , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[49]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Joonki Paik,et al.  Gait recognition using active shape model and motion prediction , 2010 .

[51]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[52]  D. Marr,et al.  Representation and recognition of the movements of shapes , 1982, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[53]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[54]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[55]  B. Triggs,et al.  Tracking Articulated Motion with Piecewise Learned Dynamical Models , 2004 .

[56]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[57]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[58]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[59]  Pascal Fua,et al.  3D Human Body Tracking Using Deterministic Temporal Motion Models , 2004, ECCV.

[60]  Yaser Sheikh,et al.  3D reconstruction of a smooth articulated trajectory from a monocular image sequence , 2011, 2011 International Conference on Computer Vision.

[61]  Jinxiang Chai,et al.  Modeling 3D human poses from uncalibrated monocular images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[62]  Pedro F. Felzenszwalb Object detection grammars , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[63]  H. S. Wolff,et al.  iRun: Horizontal and Vertical Shape of a Region-Based Graph Compression , 2022, Sensors.

[64]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[65]  Simon Lucey,et al.  Deterministic 3D Human Pose Estimation Using Rigid Structure , 2010, ECCV.

[66]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[67]  Fei-Fei Li,et al.  Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  David J. Fleet,et al.  Monocular 3D tracking of the golf swing , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[69]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[70]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[72]  Vincent Lepetit,et al.  Pose Priors for Simultaneously Solving Alignment and Correspondence , 2008, ECCV.

[73]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[75]  Jesús Martínez del Rincón,et al.  A spatio-temporal 2D-models framework for human pose recovery in monocular sequences , 2008, Pattern Recognit..

[76]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[77]  Sergio Escalera,et al.  Spatio-Temporal GrabCut human segmentation for face and pose recovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[78]  Ramachandran Baskaran,et al.  Automated human behavior analysis from surveillance videos: a survey , 2014, Artificial Intelligence Review.

[79]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[80]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[81]  Deva Ramanan,et al.  Steerable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Long Zhu,et al.  Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing , 2007, NIPS.

[83]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[84]  Ramakant Nevatia,et al.  Action recognition in cluttered dynamic scenes using Pose-Specific Part Models , 2011, 2011 International Conference on Computer Vision.

[85]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[86]  Sergio Escalera,et al.  Graph cuts optimization for multi-limb human segmentation in depth maps , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[87]  Leonid Sigal,et al.  Human Context: Modeling Human-Human Interactions for Monocular 3D Pose Estimation , 2012, AMDO.

[88]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[89]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Rainer Stiefelhagen,et al.  Visual recognition of pointing gestures for human-robot interaction , 2007, Image Vis. Comput..