Joint COCO and Mapillary Workshop at ICCV 2019 Keypoint Detection Challenge Track Technical Report: Distribution-Aware Coordinate Representation for Human Pose Estimation

In this paper, we focus on the coordinate representation in human pose estimation. While being the standard choice, heatmap based representation has not been systematically investigated. We found that the process of coordinate decoding (i.e. transforming the predicted heatmaps to the coordinates) is surprisingly significant for human pose estimation performance, which nevertheless was not recognised before. In light of the discovered importance, we further probe the design limitations of the standard coordinate decoding method and propose a principled distribution-aware decoding method. Meanwhile, we improve the standard coordinate encoding process (i.e. transforming ground-truth coordinates to heatmaps) by generating accurate heatmap distributions for unbiased model training. Taking them together, we formulate a novel Distribution-Aware coordinate Representation for Keypoint (DARK) method. Serving as a model-agnostic plug-in, DARK significantly improves the performance of a variety of state-of-the-art human pose estimation models. Extensive experiments show that DARK yields the best results on COCO keypoint detection challenge, validating the usefulness and effectiveness of our novel coordinate representation idea. The project page containing more details is at this https URL

[1]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[2]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[3]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[11]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiaogang Wang,et al.  Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).