Markerless monocular motion capture using image features and physical constraints

We present a technique to extract motion parameters of a human figure from a single video stream. Our goal is to prototype motion synthesis rapidly for game design and animation applications. For example, our approach is especially useful in situations where motion capture systems are restricted in their usefulness given the various required instrumentation. Similarly, our approach can be used to synthesize motion from archival footage. By extracting the silhouette of the foreground figure and using a model-based approach, the problem is re-formulated as a local, optimized search of the pose space. The pose space consists of 6 rigid body transformation parameters plus the internal joint angles of the figure. The silhouette of the figure from the captured video is compared against the silhouette of a synthetic figure using a pixel-by-pixel, distance-based cost function to evaluate goodness-of-fit. For a single video stream, this is not without problems. Occlusion and ambiguities arising from the use of a single view often cause spurious reconstruction of the captured motion. By using temporal coherence, physical constraints, and knowledge of the anatomy, a viable pose sequence can be reconstructed for many live-action sequences.

[1]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[2]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[3]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[4]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  Larry S. Davis,et al.  Ghost: a human body part labeling system using silhouettes , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[6]  Paul A. Viola,et al.  Learning silhouette features for control of human motion , 2004, SIGGRAPH '04.

[7]  Michael Gleicher,et al.  Evaluating video-based motion capture , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[8]  James M. Rehg,et al.  Reconstruction of 3-D Figure Motion from 2-D Correspondences , 2001, CVPR 2001.

[9]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[10]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[11]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[14]  Cristian Sminchisescu Consistency and coupling in human model likelihoods , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[15]  Larry S. Davis,et al.  A Robust Background Subtraction and Shadow Detection , 1999 .

[16]  K HodginsJessica,et al.  Interactive control of avatars animated with human motion data , 2002 .

[17]  Hans-Peter Seidel,et al.  Automated texture registration and stitching for real world models , 2000, Proceedings the Eighth Pacific Conference on Computer Graphics and Applications.

[18]  Soon Ki Jung,et al.  Particle filter with analytical inference for human body tracking , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[19]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[20]  W. Freeman,et al.  Bayesian Estimation of 3-D Human Motion , 1998 .

[21]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[22]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[23]  Vladimir Pavlovic,et al.  A dynamic Bayesian network approach to figure tracking using learned dynamic models , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  Ioannis A. Kakadiaris,et al.  Model-Based Estimation of 3D Human Motion , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Other Conferences.