论文信息 - Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

We propose a heterogeneous multi-task learning framework for human pose estimation from monocular images using a deep convolutional neural network. In particular, we simultaneously learn a human pose regressor and sliding-window body-part and joint-point detectors in a deep network architecture. We show that including the detection tasks helps to regularize the network, directing it to converge to a good solution. We report competitive and state-of-art results on several datasets. We also empirically show that the learned neurons in the middle layer of our network are tuned to localized body parts.

Antoni B. Chan | Zhi-Qiang Liu | Sijin Li | Zhi-Qiang Liu | Sijin Li

[1] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[3] Jonathan Tompson,et al. Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[4] Luc Van Gool,et al. Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Ben Taskar,et al. MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Vittorio Ferrari,et al. Human Pose Co-Estimation and Applications , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[9] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[10] Andrew Zisserman,et al. 2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[11] Anton Schwaighofer,et al. Learning Gaussian processes from multiple tasks , 2005, ICML.

[12] Vittorio Ferrari,et al. Better Appearance Models for Pictorial Structures , 2009, BMVC.

[13] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14] Xiaogang Wang,et al. Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[16] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[17] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[18] Yoshua Bengio,et al. Knowledge Matters: Importance of Prior Information for Optimization , 2013, J. Mach. Learn. Res..

[19] Bernt Schiele,et al. Articulated people detection and pose estimation: Reshaping the future , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Cristian Sminchisescu,et al. Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[21] Ben Taskar,et al. Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[22] Peter V. Gehler,et al. Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[24] Christoph Bregler,et al. Pose-Sensitive Embedding by Nonlinear NCA Regression , 2010, NIPS.

[25] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Eric P. Xing,et al. Heterogeneous multitask learning with joint sparsity constraints , 2009, NIPS.

[27] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[31] Vittorio Ferrari,et al. We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[32] Rama Chellappa,et al. Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34] Charles A. Micchelli,et al. Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[35] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[36] Jason Weston,et al. Deep learning via semi-supervised embedding , 2008, ICML '08.

[37] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..