Human Pose Regression Through Multiview Visual Fusion

We consider the problem of estimating 3-D human body pose from visual signals within a discriminative framework. It is challenging because there is a wide gap between complex 3-D human motion and planar visual observation, which makes this a severely ill-conditioned problem. In this paper, we focus on three critical factors to tackle human body pose estimation, namely, feature extraction, learning algorithm, and camera utilization. On the feature level, we describe images using the salient interest points represented by scale-invariant feature transform (SIFT)-like descriptors, in which the position, appearance, and local structural information are encoded simultaneously. On the learning algorithm level, we propose to use Gaussian processes and multiple linear (ML) regression to model the mapping between poses and features. Fusing image information from multiple cameras in different views is of great interest to us on the camera level. We make a comprehensive evaluation on the HumanEva database and get two meaningful insights into the three crucial aspects for human pose estimation: 1) although the choice of feature is very important to the problem, once the learning algorithm becomes efficient, the choice of feature is no longer critical, and 2) the impact of information combination from multiple cameras on pose estimation is closely related to not only the quantity of image information, but also its quality. In most cases, it is true that the more information is involved, the better results can be achieved. But when the information quantity is the same, the differences in quality will lead to totally different performance. Furthermore, dense evaluations demonstrate that our approach is an accurate and robust solution to the human body pose estimation problem.

[1]  L. Davis,et al.  el-based tracking of humans in action: , 1996 .

[2]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Mohan M. Trivedi,et al.  Human Body Model Acquisition and Tracking Using Voxel Data , 2003, International Journal of Computer Vision.

[5]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[6]  A. M. Tekalp,et al.  Multiple camera tracking of interacting and occluded human motion , 2001, Proc. IEEE.

[7]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Shuicheng Yan,et al.  Flexible X-Y patches for face recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yun Fu,et al.  Temporal-Spatial Local Gaussian Process Experts for Human Pose Estimation , 2009, ACCV.

[16]  Xu Zhao,et al.  Generative tracking of 3D human motion by hierarchical annealed genetic algorithm , 2008, Pattern Recognit..

[17]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Charles V. Stewart,et al.  Keypoint Descriptors for Matching Across Multiple Image Modalities and Non-linear Intensity Variations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Hans-Peter Seidel,et al.  Drift-free tracking of rigid and articulated objects , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[21]  Mun Wai Lee,et al.  A model-based approach for estimating human 3D poses in static images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Human Pose Estimation , 2007, MLMI.

[23]  Ankur Agarwal,et al.  Tracking Articulated Motion Using a Mixture of Autoregressive Models , 2004, ECCV.

[24]  Fabio Remondino,et al.  3-D reconstruction of static human body shape from image sequence , 2004, Comput. Vis. Image Underst..

[25]  Pascal Fua,et al.  Articulated Soft Objects for Video-based Body Modeling , 2001, ICCV.

[26]  Ahmed M. Elgammal,et al.  Modeling View and Posture Manifolds for Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[28]  Ioannis A. Kakadiaris,et al.  Three-Dimensional Human Body Model Acquisition from Multiple Views , 1998, International Journal of Computer Vision.

[29]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Thomas S. Huang,et al.  Discriminative estimation of 3D human pose using Gaussian processes , 2008, 2008 19th International Conference on Pattern Recognition.

[32]  Yun Fu,et al.  Multiple feature fusion by subspace learning , 2008, CIVR '08.

[33]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  S. Weisberg Applied Linear Regression , 1981 .

[35]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[36]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[37]  Hans-Peter Seidel,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[38]  Tieniu Tan,et al.  People tracking based on motion model and motion constraints with automatic initialization , 2004, Pattern Recognit..

[39]  Yihong Gong,et al.  Discriminative learning of visual words for 3D human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[41]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..