论文信息 - DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

The goal of this paper is to advance the state-of-the-art of articulated pose estimation in scenes with multiple people. To that end we contribute on three fronts. We propose (1) improved body part detectors that generate effective bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms that allow to assemble the proposals into a variable number of consistent body part configurations; and (3) an incremental optimization strategy that explores the search space more efficiently thus leading both to better performance and significant speed-up factors. Evaluation is done on two single-person and two multi-person pose estimation benchmarks. The proposed approach significantly outperforms best known multi-person pose estimation results while demonstrating competitive performance on the task of single person pose estimation (Models and code available at http://pose.mpi-inf.mpg.de).

[1] Bernt Schiele,et al. Discriminative Appearance Models for Pictorial Structures , 2011, International Journal of Computer Vision.

[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[4] Alan L. Yuille,et al. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[5] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[6] Peter V. Gehler,et al. Strong Appearance and Expressive Spatial Models for Human Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Jonathan Tompson,et al. Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12] Alan L. Yuille,et al. Parsing occluded people by flexible compositions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Varun Ramakrishna,et al. Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[14] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.

[15] Hao Jiang,et al. Global pose estimation using non-tree models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Andrew Zisserman,et al. Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17] Vittorio Ferrari,et al. We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[18] Yi Li,et al. Beyond Physical Connections: Tree Models in Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Mark Everingham,et al. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[20] Anthony Wirth,et al. Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[21] Jitendra Malik,et al. Articulated Pose Estimation Using Discriminative Armlet Classifiers , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Andrew Zisserman,et al. Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[24] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[25] Kang Zheng,et al. Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Peter V. Gehler,et al. Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Peter V. Gehler,et al. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30] Varun Ramakrishna,et al. Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Ben Taskar,et al. MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33] David A. Forsyth,et al. Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[34] Yi Yang,et al. Parsing Occluded People , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Jitendra Malik,et al. Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Amos Fiat,et al. Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[37] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).