Data augmentation method for improving the accuracy of human pose estimation with cropped images

Abstract Neural networks have improved the accuracy of human pose estimation from a single RGB image. However, such estimation remains difficult, especially when the human body is only partially visible due to a limited field of view of the camera or occlusions. In this paper, we introduce a data augmentation method called body-cropping augmentation (BCA), which generalizes the dataset for effective training in human pose estimation. This technique includes the policies of data generation and the training strategy using the augmented data. The experiments with the COCO val2017 dataset with ground-truth bounding boxes show BCA consistently enhances accuracies of state-of-the-art neural networks by an average of 1.08% without any modification to the network architecture. Moreover, the proposed BCA technique effectively reduces the false negatives of localizing keypoints, especially in an input image with a few visible keypoints.

[1]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Emre Akbas,et al.  MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network , 2018, ECCV.

[3]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Pietro Perona,et al.  Benchmarking and Error Diagnosis in Multi-instance Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[9]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[13]  Jonathan Tompson,et al.  PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[14]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[16]  Honggang Qi,et al.  Multi-Scale Structure-Aware Network for Human Pose Estimation , 2018, ECCV.

[17]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.