Tilting at windmills: Data augmentation for deep pose estimation does not help with occlusions

Occlusion degrades the performance of human pose estimation. In this paper, we introduce targeted keypoint and body part occlusion attacks. The effects of the attacks are systematically analyzed on the best performing methods. In addition, we propose occlusion specific data augmentation techniques against keypoint and part attacks. Our extensive experiments show that human pose estimation methods are not robust to occlusion and data augmentation does not solve the occlusion problems.

[1]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[3]  Bin Sun,et al.  FollowMeUp Sports: New Benchmark for 2D Human Keypoint Recognition , 2019, PRCV.

[4]  Michael J. Dinneen,et al.  Improved Mixed-Example Data Augmentation , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Stepán Obdrzálek,et al.  Accuracy and robustness of Kinect pose estimation in the context of coaching of elderly population , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[6]  Xin Chen,et al.  Pose2Body: Pose-Guided Human Parts Segmentation , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[7]  Juergen Gall,et al.  A semantic occlusion model for human pose estimation from a single depth image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Abhishek Sharma,et al.  On the Robustness of Human Pose Estimation , 2019, CVPR Workshops.

[9]  Gang Yu,et al.  Rethinking on Multi-Stage Networks for Human Pose Estimation , 2019, ArXiv.

[10]  HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  David Picard,et al.  Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition , 2020, IEEE transactions on pattern analysis and machine intelligence.

[12]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Cordelia Schmid,et al.  Modeling Visual Context is Key to Augmenting Object Detection Datasets , 2018, ECCV.

[14]  Mao Ye,et al.  Distribution-Aware Coordinate Representation for Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Lorenzo Torresani,et al.  Detect-and-Track: Efficient Pose Estimation in Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  N. Dinesh Reddy,et al.  Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Single Object Localization and Semantic Segmentation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Ujjwal Bhattacharya,et al.  ClueNet : A Deep Framework for Occluded Pedestrian Pose Estimation , 2019, BMVC.

[20]  Alexandre Alahi,et al.  PifPaf: Composite Fields for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Tatsuya Harada,et al.  Between-Class Learning for Image Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Xiaogang Wang,et al.  Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[26]  Hongyu Guo,et al.  MixUp as Locally Linear Out-Of-Manifold Regularization , 2018, AAAI.

[27]  Andrew Zisserman,et al.  Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Geoff Nitschke,et al.  Improving Deep Learning using Generic Data Augmentation , 2017 .

[29]  Dani Lischinski,et al.  A Holistic Approach for Data-Driven Object Cutout , 2016, ACCV.

[30]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Henry Medeiros,et al.  Gaze Estimation for Assisted Living Environments , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Arne Schumann,et al.  Human Pose Estimation for Real-World Crowded Scenarios , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[33]  Anton Konushin,et al.  Pose-based Deep Gait Recognition , 2017, IET Biom..

[34]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Xiaogang Wang,et al.  Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[37]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Diange Yang,et al.  Leverage of Limb Detection in Pose Estimation for Vulnerable Road Users , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[39]  Quoc V. Le,et al.  EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[41]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Timo Aila,et al.  Consistency regularization and CutMix for semi-supervised semantic segmentation , 2019, ArXiv.

[45]  Song-Chun Zhu,et al.  Joint action recognition and pose estimation from video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[47]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Subramanian Ramanathan,et al.  Joint Estimation of Human Pose and Conversational Groups from Social Scenes , 2018, International Journal of Computer Vision.

[49]  Peng Wang,et al.  Joint Multi-person Pose Estimation and Semantic Part Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Guan Huang,et al.  The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Antonio M. López,et al.  Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation , 2019, IEEE Transactions on Intelligent Transportation Systems.

[53]  Haoyu Wang,et al.  Pose Flow: Efficient Online Pose Tracking , 2018, BMVC.

[54]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Zicheng Liu,et al.  Cross-Domain Complementary Learning with Synthetic Data for Multi-Person Part Segmentation , 2019, ArXiv.

[56]  Haroon Idrees,et al.  Online Localization and Prediction of Actions and Interactions , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[58]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  João Paulo Papa,et al.  Tracking and Re-identification of People Using Soft-Biometrics , 2019, 2019 XV Workshop de Visão Computacional (WVC).