Upper body pose estimation from stereo and hand-face tracking

In applications such as immersive telepresence we want to extract high quality 3D models of collaborators in real time from multiview image sequences. One way to improve the quality of stereo or visual hull based models is to estimate the kinematic pose of the user first and then constrain 3D reconstruction accordingly. To serve as a preprocessing step such pose extraction must be very fast, precluding the usual generate and test techniques. We examine a method based on psychophysical evidence that known relative hand position can be used to directly compute the pose of the arm. First we explore a number of possible models for this relationship using motion capture data. We then examine how reconstruction of face and hand position as well as a patch on the torso, allow us to exploit these simple direct calculations to estimate the pose of a user in a desktop collaboration environment.

[1]  Norman I. Badler,et al.  Real-Time Inverse Kinematics Techniques for Anthropomorphic Limbs , 2000, Graph. Model..

[2]  Isaac Cohen,et al.  Inference of human postures by classification of 3D human body shape , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[3]  J. F. Soechting,et al.  Errors in pointing are due to approximations in sensorimotor transformations. , 1989, Journal of neurophysiology.

[4]  Thomas B. Moeslund,et al.  Modelling the Human Arm , 2002 .

[5]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[6]  Narendra Ahuja,et al.  Recognizing hand gesture using motion trajectories , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[7]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[8]  Koichi Kondo,et al.  Inverse Kinematics of a Human Arm , 1994 .

[9]  Jean-Claude Latombe,et al.  Planning motions with intentions , 1994, SIGGRAPH.

[10]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[11]  Kostas Daniilidis,et al.  Stereo-based environment scanning for immersive telepresence , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Pietro Perona,et al.  Reach out and touch space (motion learning) , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[13]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Thomas B. Moeslund,et al.  Pose estimation of a human arm using kinematic constraints , 2001 .

[15]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[16]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Erik Granum,et al.  Estimating the 3D shoulder position using monocular vision and a detailed shoulder model , 2002 .

[18]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  J. F. Soechting,et al.  Sensorimotor representations for pointing to targets in three-dimensional space. , 1989, Journal of neurophysiology.

[20]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Trevor Darrell,et al.  Constraining human body tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.