Learning Body Pose via Specialized Maps

A nonlinear supervised learning model, the Specialized Mappings Architecture (SMA), is described and applied to the estimation of human body pose from monocular images. The SMA consists of several specialized forward mapping functions and an inverse mapping function. Each specialized function maps certain domains of the input space (image features) onto the output space (body pose parameters). The key algorithmic problems faced are those of learning the specialized domains and mapping functions in an optimal way, as well as performing inference given inputs and knowledge of the inverse function. Solutions to these problems employ the EM algorithm and alternating choices of conditional independence assumptions. Performance of the approach is evaluated with synthetic and real video sequences of human motion.

[1]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[2]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[3]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[4]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[5]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[6]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[7]  Brian Sallans,et al.  A Hierarchical Community of Experts , 1999, Learning in Graphical Models.

[8]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[9]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[10]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[12]  Michael J. Black,et al.  Learning and Tracking Cyclic Human Motion , 2000, NIPS.

[13]  Rómer Rosales,et al.  Specialized mappings and the estimation of human body pose from a single image , 2000, Proceedings Workshop on Human Motion.

[14]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[15]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[16]  Yang Song,et al.  Towards detection of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).