Human Body Posture via Hierarchical Evolutionary Optimization

This paper presents an evolutionary approach to estimating upper-body posture from multi-view markerless sequences. We fit a 24-dof skeleton model to sparse 3-D stereo data from an array of cameras. We use a particle swarm optimization algorithm which is intrinsically parallel, can incorporate constraints and does not require motion models. We subdivide the high-dimensional search space based on limb dynamics from application sequences and perform hierarchical fitting from the least to the most uncertain body parts. We show experimentally the advantages of this scheme against non-hierarchical optimization in terms of sharper error decrease. We report results with 3-D scanner data of a model human and noisy, calibrated stereo disparity maps of a real videoconferencing scene. 1 Introduction and motivation This paper presents an evolutionary approach to estimating upper-body posture from multi-view markerless sequences. Our reference application is immersive videoconferencing, which aims to create an impression of presence, or co-location, among a group of participants situated at different geographical locations but meeting in a common, augmented reality space [17, 18, 19]. Vital for presence are 3-D visual cues, e.g., rendering figures consistently with the instantaneous viewer’s viewpoint. This requires estimates of the 3-D structure of the scene, which is typically done by multiview disparity analysis (see [1] and references above). Disparity maps (DMs) allow IBR-oriented systems to perform novel view synthesis consistent with the target viewpoint. A major problem is to achieve in real time DMs of the moving human body with sufficiently high quality, as typical immersive environments use large screens. Notice that videoconferencing setups rarely allow high numbers of cameras surrounding the scene [2]; occlusions created by gestures make DMs sparse. Assuming suitable computing power, integrating bottom-up disparities with body models can yield superior quality DMs. Previously, only local DM enhancement has been attempted by infilling [3, 4]. In order to further this idea, we intend to fit human body models to DMs of multi-view videoconferencing sequences. This paper concentrates on the first step, skeleton fitting for posture estimation, and introduces a novel evolutionary approach to the high-dimensional optimization problems typical of body model fitting with markerless sequences. 1

[1]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[2]  Mohan M. Trivedi,et al.  Human Body Model Acquisition and Tracking Using Voxel Data , 2003, International Journal of Computer Vision.

[3]  W. Gropp,et al.  Using MPI-2nd Edition , 1999 .

[4]  Pascal Fua,et al.  Style‐Based Motion Synthesis † , 2004, Comput. Graph. Forum.

[5]  Emanuele Trucco,et al.  Human Body Pose Estimation with PSO , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[6]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[7]  Oliver Schreer,et al.  Real-Time Disparity Analysis for Immersive 3-D Teleconferencing by Hybrid Recursive Matching and Census Transform , 2001, ICCV 2001.

[8]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[9]  Andrew Blake,et al.  Gaze manipulation for one-to-one teleconferencing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[11]  Thomas Malzbender,et al.  The Coliseum Immersive Teleconferencing System , 2002 .

[12]  Michael J. Black,et al.  Automatic Detection and Tracking of Human Motion with a View-Based Representation , 2002, ECCV.

[13]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[14]  Konstantinos E. Parsopoulos,et al.  PARTICLE SWARM OPTIMIZER IN NOISY AND CONTINUOUSLY CHANGING ENVIRONMENTS , 2001 .

[15]  Pascal Fua,et al.  Tracking and Modeling People in Video Sequences , 2001, Comput. Vis. Image Underst..

[16]  Russell C. Eberhart,et al.  Tracking and optimizing dynamic systems with particle swarms , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[17]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[18]  Zoran Popovic,et al.  Articulated body deformation from range scan data , 2002, SIGGRAPH.

[19]  Avideh Zakhor,et al.  View generation for three-dimensional scenes from video sequences , 1997, IEEE Trans. Image Process..

[20]  Russell C. Eberhart,et al.  Adaptive particle swarm optimization: detection and response to dynamic systems , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[21]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[22]  Emanuele Trucco,et al.  Dense wide-baseline disparities from conventional stereo for immersive videoconferencing , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.