Feature-Based Pose Estimation

In this chapter we review challenges and methodology for feature-based predictive three-dimensional human pose reconstruction, based on image and video data. We argue that reliable 3D human pose prediction can be achieved through an alliance between image descriptors that encode multiple levels of selectivity and invariance and models that are capable to represent multiple structured solutions. For monocular systems, key to reliability is the capacity to leverage prior knowledge in order to bias solutions not only to kinematically feasible sets, but also toward typical configurations that humans are likely to assume in everyday surroundings. In this context, we discuss several predictive methods including large-scale mixture of experts, supervised spectral latent variable models and structural support vector machines, asses the impact of the various choices of image descriptors, review open problems, and give pointers to datasets and code available online.

[1]  Hsi-Jian Lee,et al.  Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[2]  David J. Fleet,et al.  People tracking using hybrid Monte Carlo filtering , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[3]  Cristian Sminchisescu,et al.  Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Michael J. Black,et al.  Shining a Light on Human Pose: On Shadows, Shading and the Estimation of Pose and Shape , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  D. Böhning Multinomial logistic regression algorithm , 1992 .

[6]  W. Marsden I and J , 2012 .

[7]  Michael Isard,et al.  Tracking loose-limbed people , 2004, CVPR 2004.

[8]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[9]  Cristian Sminchisescu,et al.  Generative modeling for continuous non-linearly embedded visual inference , 2004, ICML.

[10]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[11]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[12]  Wojciech Matusik,et al.  Practical motion capture in everyday surroundings , 2007, SIGGRAPH 2007.

[13]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[14]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[15]  Cristian Sminchisescu,et al.  Conditional Visual Tracking in Kernel Space , 2005, NIPS.

[16]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Vittorio Ferrari,et al.  We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[18]  Luc Van Gool,et al.  Full body tracking from multiple views using stochastic sampling , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Michael J. Black,et al.  Predicting 3D People from 2D Pictures , 2006, AMDO.

[20]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[21]  Cristian Sminchisescu,et al.  Fast algorithms for large scale conditional 3D prediction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Bernhard Schölkopf,et al.  Joint Kernel Maps , 2005, IWANN.

[24]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[25]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Cristian Sminchisescu,et al.  Spectral Latent Variable Models for Perceptual Inference , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[28]  Ian D. Reid,et al.  Automatic partitioning of high dimensional search spaces associated with articulated body motion capture , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[29]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[30]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[31]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[32]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  J. Koenderink,et al.  The internal representation of solid shape with respect to vision , 1979, Biological Cybernetics.

[34]  S. Lauritzen,et al.  The TM algorithm for maximising a conditional likelihood function , 2001 .

[35]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Michael J. Black,et al.  Learning image statistics for Bayesian tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[37]  Cristian Sminchisescu,et al.  Supervised Spectral Latent Variable Models , 2009, AISTATS.

[38]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[39]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[42]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[43]  Ankur Agarwal,et al.  A Local Basis Representation for Estimating Human Pose from Cluttered Images , 2006, ACCV.

[44]  R. Cook Regression Graphics , 1994 .

[45]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[46]  Miguel Á. Carreira-Perpiñán,et al.  People Tracking with the Laplacian Eigenmaps Latent Variable Model , 2007, NIPS.

[47]  David J. Fleet,et al.  Topologically-constrained latent variable models , 2008, ICML '08.

[48]  Cristian Sminchisescu,et al.  BM³E : Discriminative Density Propagation for Visual Tracking , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Vladimir Pavlovic,et al.  Dimensionality reduction using covariance operator inverse regression , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  David J. Fleet,et al.  The Kneed Walker for human pose tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Cristian Sminchisescu,et al.  Hyperdynamics Importance Sampling , 2002, ECCV.

[52]  C. Sminchisescu,et al.  Variational mixture smoothing for non-linear dynamical systems , 2004, CVPR 2004.

[53]  Alister Hamilton,et al.  9th International Work-Conference on Artificial Neural Networks , 2007 .

[54]  Cristian Sminchisescu,et al.  Building Roadmaps of Minima and Transitions in Visual Models , 2004, International Journal of Computer Vision.

[55]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[56]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[57]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[58]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[59]  J. Koenderink,et al.  Ambiguity in Pictorial Depth , 2007, Perception.

[60]  Cristian Sminchisescu,et al.  Structural SVM for visual localization and continuous state estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[61]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[62]  Cristian Sminchisescu,et al.  Building Roadmaps of Local Minima of Visual Models , 2002, ECCV.

[63]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[64]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[65]  Wojciech Matusik,et al.  Practical motion capture in everyday surroundings , 2007, ACM Trans. Graph..

[66]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[67]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[68]  Odest Chadwicke Jenkins,et al.  Physical simulation for probabilistic motion tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Jason Weston,et al.  A general regression technique for learning transductions , 2005, ICML '05.

[70]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[71]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[72]  David J. Fleet,et al.  Shared Kernel Information Embedding for discriminative inference , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  Cristian Sminchisescu,et al.  Generalized Darting Monte Carlo , 2007, AISTATS.

[74]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[75]  David J. Fleet,et al.  Erratum: "Gaussian process dynamical models for human motion" (IEEE Transactions on Pattern analysis and Machine Intelligenc (292)) , 2008 .

[76]  David J. Fleet,et al.  Shared Kernel Information Embedding for discriminative inference , 2009, CVPR.

[77]  James M. Rehg,et al.  Singularity analysis for articulated object tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[78]  Cristian Sminchisescu,et al.  Training Deformable Models for Localization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[79]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[80]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[81]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[82]  Cristian Sminchisescu,et al.  Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Jan J. Koenderink,et al.  Pictorial relief , 2019, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[84]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.