Human Pose Estimation in Stereo Images

In this paper, we address the problem of 3D human body pose estimation from depth images acquired by a stereo camera. Compared to the Kinect sensor, stereo cameras work outdoors having a much higher operational range, but produce noisier data. In order to deal with such data, we propose a framework for 3D human pose estimation that relies on random forests. The first contribution is a novel grid-based shape descriptor robust to noisy stereo data that can be used by any classifier. The second contribution is a two step classification procedure, first classifying the body orientation, then proceeding with determining the full 3D pose within this orientation cluster. To validate our method, we introduce a dataset recorded with a stereo camera synchronized with an optical motion capture system that provides ground truth human body poses.

[1]  Pascal Fua,et al.  3D Human Body Tracking Using Deterministic Temporal Motion Models , 2004, ECCV.

[2]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[3]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Pascal Fua,et al.  Articulated Soft Objects for Multiview Shape and Motion Capture , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[6]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[7]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[9]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Dariu Gavrila,et al.  PedCut: an iterative framework for pedestrian segmentation combining shape models and multiple data cues , 2013, BMVC.

[11]  Peter Kontschieder,et al.  GeoF: Geodesic Forests for Learning Coupled Predictors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[13]  Dariu Gavrila,et al.  Integrated pedestrian classification and orientation estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Andrew W. Fitzgibbon,et al.  Metric Regression Forests for Human Pose Estimation , 2013, BMVC.

[15]  Olivier Bernier,et al.  Fast nonparametric belief propagation for real-time stereo articulated body tracking , 2009, Comput. Vis. Image Underst..