AID: Pushing the Performance Boundary of Human Pose Estimation with Information Dropping Augmentation

Both appearance cue and constraint cue are vital for human pose estimation. However, there is a tendency in most existing works to overfitting the former and overlook the latter. In this paper, we propose Augmentation by Information Dropping (AID) to verify and tackle this dilemma. Alone with AID as a prerequisite for effectively exploiting its potential, we propose customized training schedules, which are designed by analyzing the pattern of loss and performance in training process from the perspective of information supplying. In experiments, as a model-agnostic approach, AID promotes various state-of-the-art methods in both bottom-up and top-down paradigms with different input sizes, frameworks, backbones, training and testing sets. On popular COCO human pose estimation test set, AID consistently boosts the performance of different configurations by around 0.6 AP in top-down paradigm and up to 1.5 AP in bottom-up paradigm. On more challenging CrowdPose dataset, the improvement is more than 1.5 AP. As AID successfully pushes the performance boundary of human pose estimation problem by considerable margin and sets a new state-of-the-art, we hope AID to be a regular configuration for training human pose estimators. The source code will be publicly available for further research.

[1]  Alexandre Alahi,et al.  PifPaf: Composite Fields for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Yiming Hu,et al.  Convolutional relation network for skeleton-based action recognition , 2019, Neurocomputing.

[5]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jonathan Tompson,et al.  PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[7]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[9]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[10]  Xiangyu Zhang,et al.  Learning Delicate Local Representations for Multi-Person Pose Estimation , 2020, ECCV.

[11]  Kyoung Mu Lee,et al.  PoseFix: Model-Agnostic General Human Pose Refinement Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[13]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[14]  Hengshuang Zhao,et al.  GridMask Data Augmentation , 2020, ArXiv.

[15]  Yong Jae Lee,et al.  Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond , 2018, ArXiv.

[16]  Emre Akbas,et al.  MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network , 2018, ECCV.

[17]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[18]  Bo Zhao,et al.  AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding , 2017, ArXiv.

[19]  Hao Zhu,et al.  CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[21]  Guan Huang,et al.  The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xiaogang Wang,et al.  Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Mao Ye,et al.  Distribution-Aware Coordinate Representation for Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jan C. van Gemert,et al.  Tilting at windmills: Data augmentation for deep pose estimation does not help with occlusions , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[28]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yichen Wei,et al.  Integral Human Pose Regression , 2017, ECCV.

[30]  Jian Wang,et al.  Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement , 2020, ECCV.

[31]  Anil K. Jain,et al.  Towards Universal Representation Learning for Deep Face Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Wenhai Wang,et al.  Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation , 2020, ECCV.

[33]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Wei Jiang,et al.  Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[37]  Ruigang Yang,et al.  Human Pose Estimation with Spatial Contextual Information , 2019, ArXiv.

[38]  Honggang Qi,et al.  Multi-Scale Structure-Aware Network for Human Pose Estimation , 2018, ECCV.

[39]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Gang Yu,et al.  Rethinking on Multi-Stage Networks for Human Pose Estimation , 2019, ArXiv.

[41]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[42]  Shifeng Zhang,et al.  PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes , 2019, AAAI.

[43]  Wei Zou,et al.  Action Machine: Toward Person-Centric Action Recognition in Videos , 2019, IEEE Signal Processing Letters.

[44]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[45]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[46]  Dongdong Yu,et al.  A Context-and-Spatial Aware Network for Multi-Person Pose Estimation , 2019, ArXiv.

[47]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).