论文信息 - Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

We propose an heterogeneous multi-task learning framework for human pose estimation from monocular image with deep convolutional neural network. In particular, we simultaneously learn a pose-joint regressor and a sliding-window body-part detector in a deep network architecture. We show that including the body-part detection task helps to regularize the network, directing it to converge to a good solution. We report competitive and state-of-art results on several data sets. We also empirically show that the learned neurons in the middle layer of our network are tuned to localized body parts.

Antoni B. Chan | Zhi-Qiang Liu | Sijin Li

[1] Toby Sharp,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[2] Cristian Sminchisescu,et al. Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[3] Ben Taskar,et al. Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[4] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[5] Vittorio Ferrari,et al. Better Appearance Models for Pictorial Structures , 2009, BMVC.

[6] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Xiaogang Wang,et al. Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Eric P. Xing,et al. Heterogeneous multitask learning with joint sparsity constraints , 2009, NIPS.

[10] Christoph Bregler,et al. Pose-Sensitive Embedding by Nonlinear NCA Regression , 2010, NIPS.

[11] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Charles A. Micchelli,et al. Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[13] Vittorio Ferrari,et al. Human Pose Co-Estimation and Applications , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Peter V. Gehler,et al. Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[16] Jason Weston,et al. Deep learning via semi-supervised embedding , 2008, ICML '08.

[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19] Anton Schwaighofer,et al. Learning Gaussian processes from multiple tasks , 2005, ICML.

[20] Andrew Zisserman,et al. 2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[21] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[23] Ben Taskar,et al. MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[25] Rama Chellappa,et al. Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[27] Vittorio Ferrari,et al. We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.