From Canonical Poses to 3D Motion Capture Using a Single Camera

We combine detection and tracking techniques to achieve robust 3D motion recovery of people seen from arbitrary viewpoints by a single and potentially moving camera. We rely on detecting key postures, which can be done reliably, using a motion model to infer 3D poses between consecutive detections, and finally refining them over the whole sequence using a generative model. We demonstrate our approach in the cases of golf motions filmed using a static camera and walking motions acquired using a potentially moving one. We will show that our approach, although monocular, is both metrically accurate because it integrates information over many frames and robust because it can recover from a few misdetections.

[1]  Andrew W. Fitzgibbon,et al.  The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  A. Elgammal,et al.  Body Pose Tracking From Uncalibrated Camera Using Supervised Manifold Learning , 2006 .

[3]  Andrea Fossati,et al.  Linking Pose and Motion , 2008, ECCV.

[4]  David J. Fleet,et al.  Physics-Based Person Tracking Using Simplified Lower-Body Dynamics , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Vincent Lepetit,et al.  Human body pose detection using Bayesian spatio-temporal templates , 2006, Comput. Vis. Image Underst..

[6]  Adrian Hilton,et al.  Viewpoint invariant exemplar-based 3D human tracking , 2006, Comput. Vis. Image Underst..

[7]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[8]  David J. Fleet,et al.  People tracking using hybrid Monte Carlo filtering , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  Cristian Sminchisescu,et al.  Fast algorithms for large scale conditional 3D prediction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Michael J. Black,et al.  Learning the Statistics of People in Images and Video , 2003, International Journal of Computer Vision.

[11]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[12]  Rui Li,et al.  3D Human Motion Tracking with a Coordinated Mixture of Factor Analyzers , 2009, International Journal of Computer Vision.

[13]  Stefan Carlsson,et al.  Recognizing and Tracking Human Action , 2002, ECCV.

[14]  David J. Fleet,et al.  Temporal motion models for monocular and multiview 3D human body tracking , 2006, Comput. Vis. Image Underst..

[15]  Gang Hua,et al.  Tracking articulated body by dynamic Markov network , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Trevor Darrell,et al.  Conditional Random People: Tracking Humans with CRFs and Grid Filters , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[21]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[22]  Ken Shoemake,et al.  Animating rotation with quaternion curves , 1985, SIGGRAPH.

[23]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  Andrew Blake,et al.  Markerless motion capture of complex full-body movement for character anima-tion , 2001, CVPR 2000.

[25]  Stefan Carlsson,et al.  Monocular 3D Reconstruction of Human Motion in Long Action Sequences , 2004, ECCV.

[26]  James M. Rehg,et al.  Reconstruction of 3D figure motion from 2D correspondences , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Cordelia Schmid,et al.  Face detection in a video sequence - a temporal approach , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[29]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[30]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Hans-Peter Seidel,et al.  Scaled Motion Dynamics for Markerless Motion Capture , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  James M. Rehg,et al.  Reconstruction of 3-D Figure Motion from 2-D Correspondences , 2001, CVPR 2001.

[33]  Michael Isard,et al.  BraMBLe: a Bayesian multiple-blob tracker , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[34]  B. Triggs,et al.  Tracking Articulated Motion with Piecewise Learned Dynamical Models , 2004 .

[35]  Carlo Tomasi,et al.  3D tracking = classification + interpolation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[36]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Michael J. Black,et al.  The Naked Truth: Estimating Body Shape Under Clothing , 2008, ECCV.

[38]  David J. Fleet,et al.  Physics-Based Human Pose Tracking , 2006 .

[39]  Andrew W. Fitzgibbon,et al.  Markerless tracking using planar structures in the scene , 2000, Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000).

[40]  Clark F. Olson,et al.  Automatic target recognition by matching oriented edge pixels , 1997, IEEE Trans. Image Process..

[41]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[42]  Qiang Wang,et al.  Learning object intrinsic structure for robust visual tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[43]  Rui Li,et al.  Monocular Tracking of 3D Human Motion with a Coordinated Mixture of Factor Analyzers , 2006, ECCV.

[44]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Dariu Gavrila,et al.  A Bayesian Framework for Multi-cue 3D Object Tracking , 2004, ECCV.

[46]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[47]  Michael J. Black,et al.  Learning and Tracking Cyclic Human Motion , 2000, NIPS.

[48]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.