Pose Partition Networks for Multi-person Pose Estimation

This paper proposes a novel Pose Partition Network (PPN) to address the challenging multi-person pose estimation problem. The proposed PPN is favorably featured by low complexity and high accuracy of joint detection and partition. In particular, PPN performs dense regressions from global joint candidates within a specific embedding space, which is parameterized by centroids of persons, to efficiently generate robust person detection and joint partition. Then, PPN infers body joint configurations through conducting graph partition for each person detection locally, utilizing reliable global affinity cues. In this way, PPN reduces computation complexity and improves multi-person pose estimation significantly. We implement PPN with the Hourglass architecture as the backbone network to simultaneously learn joint detector and dense regressor. Extensive experiments on benchmarks MPII Human Pose Multi-Person, extended PASCAL-Person-Part, and WAF show the efficiency of PPN with new state-of-the-art performance.

[1]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Thomas Brox,et al.  Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[7]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[9]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Silvio Savarese,et al.  Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[12]  Bernt Schiele,et al.  Articulated Multi-person Tracking in the Wild , 2016, ArXiv.

[13]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14]  Vittorio Ferrari,et al.  We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[15]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[16]  Peng Wang,et al.  Joint Multi-person Pose Estimation and Semantic Part Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jitendra Malik,et al.  Using k-Poselets for Detecting People and Localizing Their Keypoints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Bernt Schiele,et al.  Articulated people detection and pose estimation: Reshaping the future , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew Zisserman,et al.  Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Bernt Schiele,et al.  ArtTrack: Articulated Multi-Person Tracking in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Alan L. Yuille,et al.  Parsing occluded people by flexible compositions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Juergen Gall,et al.  Multi-person Pose Estimation with Local Joint-to-Person Associations , 2016, ECCV Workshops.

[26]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[27]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.