Human Body Model Acquisition and Tracking Using Voxel Data

We present an integrated system for automatic acquisition of the human body model and motion tracking using input from multiple synchronized video streams. The video frames are segmented and the 3D voxel reconstructions of the human body shape in each frame are computed from the foreground silhouettes. These reconstructions are then used as input to the model acquisition and tracking algorithms.The human body model consists of ellipsoids and cylinders and is described using the twists framework resulting in a non-redundant set of model parameters. Model acquisition starts with a simple body part localization procedure based on template fitting and growing, which uses prior knowledge of average body part shapes and dimensions. The initial model is then refined using a Bayesian network that imposes human body proportions onto the body part size estimates. The tracker is an extended Kalman filter that estimates model parameters based on the measurements made on the labeled voxel data. A voxel labeling procedure that handles large frame-to-frame displacements was designed resulting in very robust tracking performance.Extensive evaluation shows that the system performs very reliably on sequences that include different types of motion such as walking, sitting, dancing, running and jumping and people of very different body sizes, from a nine year old girl to a tall adult male.

[1]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[2]  Takeo Kanade,et al.  A real time system for robust 3D voxel reconstruction of human motions , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3]  Pascal Fua,et al.  Articulated Soft Objects for Video-based Body Modeling , 2001, ICCV.

[4]  Hans-Hellmut Nagel,et al.  Tracking Persons in Monocular Image Sequences , 1999, Comput. Vis. Image Underst..

[5]  James M. Rehg,et al.  Reconstruction of 3D figure motion from 2D correspondences , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Olivier D. Faugeras,et al.  3D Articulated Models and Multiview Tracking with Physical Forces , 2001, Comput. Vis. Image Underst..

[7]  William H. Press,et al.  Numerical recipes in C++: the art of scientific computing, 2nd Edition (C++ ed., print. is corrected to software version 2.10) , 1994 .

[8]  Trevor Darrell,et al.  Articulated-pose estimation using brightness- and depth-constancy constraints , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[9]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[10]  Christopher Richard Wren,et al.  Understanding expressive action , 2000 .

[11]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[12]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[13]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Mohan M. Trivedi,et al.  Articulated body posture estimation from multi-camera voxel data , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Ian D. Reid,et al.  Automatic partitioning of high dimensional search spaces associated with articulated body motion capture , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  James M. Rehg,et al.  Reconstruction of 3-D Figure Motion from 2-D Correspondences , 2001, CVPR 2001.

[17]  Mohan M. Trivedi,et al.  Human body model acquisition and tracking using multi-camera voxel data , 2002 .

[18]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[19]  Soon Ki Jung,et al.  Tracking and Motion Estimation of the Articulated Object: a Hierarchical Kalman Filter Approach , 1997, Real Time Imaging.

[20]  Ioannis A. Kakadiaris,et al.  Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Nebojsa Jojic,et al.  Tracking self-occluding articulated objects in dense disparity maps , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[23]  Roger Y. Tsai,et al.  A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses , 1987, IEEE J. Robotics Autom..

[24]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Pascal Fua,et al.  Tracking and Modeling People in Video Sequences , 2001, Comput. Vis. Image Underst..

[26]  Ioannis A. Kakadiaris,et al.  Three-Dimensional Human Body Model Acquisition from Multiple Views , 1998, International Journal of Computer Vision.

[27]  David Harwood A statistical approach for real time robust background subtraction , 1999 .

[28]  Dimitris N. Metaxas,et al.  Shape and Nonrigid Motion Estimation Through Physics-Based Synthesis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[30]  Richard Szeliski,et al.  Rapid octree construction from image sequences , 1993 .

[31]  Ramesh Jain,et al.  Visual estimation of articulated motion using the expectation-constrained maximization algorithm , 1999 .

[32]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[33]  Takuya Kondo,et al.  Incremental tracking of human actions from multiple views , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[34]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[35]  David A. Forsyth,et al.  Human Tracking with Mixtures of Trees , 2001, ICCV.

[36]  R. Jain,et al.  Estimation of articulated motion using kinematically constrained mixture densities , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[37]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[38]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[39]  Adrian Hilton Towards model-based capture of a persons shape, appearance and motion , 1999, Proceedings IEEE International Workshop on Modelling People. MPeople'99.