Beyond trees: common-factor models for 2D human pose recovery

Tree structured models have been widely used for determining the pose of a human body, from either 2D or 3D data. While such models can effectively represent the kinematic constraints of the skeletal structure, they do not capture additional constraints such as coordination of the limbs. Tree structured models thus miss an important source of information about human body pose, as limb coordination is necessary for balance while standing, walking, or running, as well as being evident in other activities such as dancing and throwing. In this paper, we consider the use of undirected graphical models that augment a tree structure with latent variables in order to account for coordination between limbs. We refer to these as common-factor models, since they are constructed by using factor analysis to identify additional correlations in limb position that are not accounted for by the kinematic tree structure. These common-factor models have an underlying tree structure and thus a variant of the standard Viterbi algorithm for a tree can be applied for efficient estimation. We present some experimental results contrasting common-factor models with tree models, and quantify the improvement in pose estimation for 2D image data.

[1]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[2]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[3]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[4]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[6]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[8]  David A. Forsyth,et al.  Mixtures of trees for object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  James M. Coughlan,et al.  Finding Deformable Shapes Using Loopy Belief Propagation , 2002, ECCV.

[11]  Yang Song,et al.  Unsupervised Learning of Human Motion , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Nir Friedman,et al.  "Ideal Parent" Structure Learning for Continuous Variable Networks , 2004, UAI.

[15]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..