Specialized mappings and the estimation of human body pose from a single image

We present an approach for recovering articulated body pose from single monocular images using the Specialized Mappings Architecture (SMA), a nonlinear supervised learning architecture. SMAs consist of several specialized forward (input to output space) mapping functions and a feedback matching function, estimated automatically from data. Each of these forward functions maps certain areas (possibly disconnected) of the input space onto the output space. A probabilistic model for the architecture is first formalized along with a mechanism for learning its parameters. The learning problem is approached using a maximum likelihood estimation framework; we present expectation maximization (EM) algorithms for several different choices of the likelihood function. The performance of the presented solutions under these different likelihood functions is compared in the task of estimating human body posture from low-level visual features obtained from a single image, showing promising results.

[1]  Yang Song,et al.  Towards detection of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[2]  Brian Sallans,et al.  A Hierarchical Community of Experts , 1999, Learning in Graphical Models.

[3]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[4]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[5]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[6]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  S. Fomin,et al.  Elements of the Theory of Functions and Functional Analysis , 1961 .

[8]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[9]  Larry S. Davis,et al.  Ghost: a human body part labeling system using silhouettes , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[10]  David J. Fleet,et al.  Stochastic Tracking of 3 D Human Figures Using 2 D Image Motion , 2000 .

[11]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[12]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[13]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[14]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  J. Friedman Multivariate adaptive regression splines , 1990 .

[16]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[17]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[18]  Larry S. Davis,et al.  Tracking of humans in action: a 3-D model-based approach , 1996 .

[19]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[20]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[21]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[22]  Ioannis A. Kakadiaris,et al.  Estimating anthropometry and pose from a single image , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[23]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[24]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[27]  Rómer Rosales,et al.  3D trajectory recovery for tracking multiple objects and trajectory guided recognition of actions , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).