Learning to Augment Poses for 3D Human Pose Estimation in Images and Videos

Existing 3D human pose estimation methods often suffer inferior generalization performance to new datasets, largely due to the limited diversity of 2D-3D pose pairs in the training data. To address this problem, we present PoseAug, a novel auto-augmentation framework that learns to augment the available training poses towards greater diversity and thus enhances the generalization power of the trained 2D-to-3D pose estimator. Specifically, PoseAug introduces a novel pose augmentor that learns to adjust various geometry factors of a pose through differentiable operations. With such differentiable capacity, the augmentor can be jointly optimized with the 3D pose estimator and take the estimation error as feedback to generate more diverse and harder poses in an online manner. PoseAug is generic and handy to be applied to various 3D pose estimation models. It is also extendable to aid pose estimation from video frames. To demonstrate this, we introduce PoseAug-V, a simple yet effective method that decomposes video pose augmentation into end pose augmentation and conditioned intermediate pose generation. Extensive experiments demonstrate that PoseAug and its extension PoseAug-V bring clear improvements for frame-based and video-based 3D pose estimation on several out-of-domain 3D human pose benchmarks.

[1]  Xinchao Wang,et al.  PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shuicheng Yan,et al.  Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural Human Rendering , 2021, ECCV.

[3]  Shuicheng Yan,et al.  Direct Multi-view Multi-person 3D Pose Estimation , 2021, NeurIPS.

[4]  Yao Zhao,et al.  Spatial-Aware Texture Transformer for High-Fidelity Garment Transfer , 2021, IEEE Transactions on Image Processing.

[5]  Dongdong Yu,et al.  Body Meshes as Points , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jiashi Feng,et al.  PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zhengming Ding,et al.  3D Human Pose Estimation with Spatial and Temporal Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Stephen Lin,et al.  SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach , 2020, ECCV.

[9]  Jiashi Feng,et al.  Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation , 2020, NeurIPS.

[10]  Kwang-Ting Cheng,et al.  Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Charless C. Fowlkes,et al.  Predicting Camera Viewpoint Improves Cross-dataset Generalization for 3D Human Pose Estimation , 2020, ECCV Workshops.

[12]  Yu Cheng,et al.  3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training , 2020, AAAI.

[13]  Kris M. Kitani,et al.  DLow: Diversifying Latent Flows for Diverse Human Motion Prediction , 2020, ECCV.

[14]  Cristian Sminchisescu,et al.  Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction , 2019, NeurIPS.

[15]  Nadia Magnenat-Thalmann,et al.  Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Kui Jia,et al.  HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Shuicheng Yan,et al.  Single-Stage Multi-Person Pose Machines , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Yizhou Wang,et al.  Optimizing Network Structure for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Kyoung Mu Lee,et al.  Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Yan Chen,et al.  Generalizing Monocular 3D Human Pose Estimation in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[21]  James M. Rehg,et al.  Unsupervised 3D Pose Estimation With Geometric Self-Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yu Tian,et al.  Semantic Graph Convolutional Networks for 3D Human Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Christian Theobalt,et al.  In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Saurabh Sharma,et al.  Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Chen Qian,et al.  Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Pascal Fua,et al.  Neural Scene Decomposition for Multi-Person Motion Capture , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Emre Akbas,et al.  Self-Supervised Learning of 3D Human Pose Using Multi-View Geometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Bodo Rosenhahn,et al.  RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Tieniu Tan,et al.  An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Alan L. Yuille,et al.  OriNet: A Fully Convolutional Network for 3D Human Pose Estimation , 2018, BMVC.

[33]  Sanghoon Lee,et al.  Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency , 2018, ECCV.

[34]  A. Tyagi,et al.  Can 3D Pose be Learned from 2D Projections Alone? , 2018, ECCV Workshops.

[35]  Zhen He,et al.  3D Human Pose Estimation With 2D Marginal Heatmaps , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[36]  Pascal Fua,et al.  Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation , 2018, ECCV.

[37]  Xiaogang Wang,et al.  3D Human Pose Estimation in the Wild by Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Christian Theobalt,et al.  Single-Shot Multi-person 3D Pose Estimation from Monocular RGB , 2017, 2018 International Conference on 3D Vision (3DV).

[40]  Abhishek Sharma,et al.  Learning 3D Human Pose from Structure and Motion , 2017, ECCV.

[41]  James J. Little,et al.  Exploiting Temporal Information for 3D Human Pose Estimation , 2017, ECCV.

[42]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[45]  Yichen Wei,et al.  Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[48]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Cordelia Schmid,et al.  MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[50]  Zhenhua Wang,et al.  Synthesizing Training Images for Boosting Human 3D Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[51]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Christian Szegedy,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[54]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[59]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[60]  Andrew Zisserman,et al.  Multiple view geometry in computer vision (2. ed.) , 2006 .

[61]  Bernhard P. Wrobel Multiple View Geometry in Computer Vision , 2001, Künstliche Intell..