Fast online human pose estimation via 3D voxel data

In this paper, a novel approach is proposed to recover human body pose from 3D voxel data. The use of voxel data leads to viewpoint-free estimation, which benefits in that reconstruction of a training model is needless in different multi-camera arrangements. Other notable aspects of our approach are real-time ensuring speed (up to 30[FPS]), flexibility towards various complex motions, and robustness towards voxel noise. The main concept of our approach is based on an example based approach. Human posture candidates are constructed beforehand from a large motion capture database, and the most appropriate posture is estimated per frame by comparing the likelihoods between 3D voxel data and posture candidates. The evaluation is formulated by introducing a histogram- based feature vector that represents the 3D shape context of human body. In addition, a fast near-neighbor search metric is installed prior to the evaluation process, in order to reduce the computational cost and ensure real-time processing. Estimation stability is also improved by a graphical model of motion, which adds a smoothing effect to the motion sequence. We demonstrate the effectiveness of our approach with experiments on both synthetic and real image sequences.

[1]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[2]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[3]  Mohan M. Trivedi,et al.  Human Body Model Acquisition and Tracking Using Voxel Data , 2003, International Journal of Computer Vision.

[4]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Gregory D. Abowd,et al.  The Aware Home: A Living Laboratory for Ubiquitous Computing Research , 1999, CoBuild.

[6]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[7]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[8]  Trevor Darrell,et al.  Conditional Random People: Tracking Humans with CRFs and Grid Filters , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Takuya Kondo,et al.  Incremental tracking of human actions from multiple views , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[10]  Jake K. Aggarwal,et al.  Volumetric Descriptions of Objects from Multiple Views , 1983, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Luc Van Gool,et al.  Full body tracking from multiple views using stochastic sampling , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Mohan M. Trivedi,et al.  3D Shape Context Based Gesture Analysis Integrated with Tracking using Omni Video Array , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[13]  Marcel Körtgen,et al.  3D Shape Matching with 3D Shape Contexts , 2003 .

[14]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[15]  Paul A. Viola,et al.  Learning silhouette features for control of human motion , 2004, SIGGRAPH '04.

[16]  Aldo Laurentini,et al.  How Far 3D Shapes Can Be Understood from 2D Silhouettes , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.