Viewpoint invariant exemplar-based 3D human tracking

This paper proposes a clustered exemplar-based model for performing viewpoint invariant tracking of the 3D motion of a human subject from a single camera. Each exemplar is associated with multiple view visual information of a person and the corresponding 3D skeletal pose. The visual information takes the form of contours obtained from different viewpoints around the subject. The inclusion of multi-view information is important for two reasons: viewpoint invariance; and generalisation to novel motions. Visual tracking of human motion is performed using a particle filter coupled to the dynamics of human movement represented by the exemplar-based model. Dynamics are modelled by clustering 3D skeletal motions with similar movement and encoding the flow both within and between clusters. Results of single view tracking demonstrate that the exemplar-based models incorporating dynamics generalise to viewpoint invariant tracking of novel movements.

[1]  Anthony James Heap Learning deformable shape models for object tracking , 1997 .

[2]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[3]  Pascal Fua,et al.  Human Body Pose Recognition Using Spatio-Temporal Templates , 2005, ICCV 2005.

[4]  Mansoor Sarhadi,et al.  Reconstructing 3D Pose and Motion from a Single Camera View , 1998, BMVC.

[5]  Cristian Sminchisescu,et al.  Generative modeling for continuous non-linearly embedded visual inference , 2004, ICML.

[6]  Robert C. Bolles,et al.  Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[7]  Maja J. Mataric,et al.  Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[8]  Paul A. Viola,et al.  Learning silhouette features for control of human motion , 2004, SIGGRAPH '04.

[9]  Okan Arikan,et al.  Interactive motion generation from examples , 2002, ACM Trans. Graph..

[10]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[11]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Trevor Darrell,et al.  Learning appearance manifolds from video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[17]  Michael Isard,et al.  Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation , 2003, NIPS.

[18]  Pascal Fua,et al.  3D Human Body Tracking Using Deterministic Temporal Motion Models , 2004, ECCV.

[19]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[20]  Björn Stenger,et al.  Hand Pose Estimation Using Hierarchical Detection , 2004, ECCV Workshop on HCI.

[21]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[22]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.