PARE: Part Attention Regressor for 3D Human Body Estimation

The Supplementary Material consists of this document and a video. They include acknowledgement, disclosure, additional information and visualizations of our method and results. Acknowledgements: We thank Joachim Tesch for helping with Blender rendering. We thank Nikos Athanasiou, Vassilis Choutas, Emre Aksan, Stefan Stevsics, Xu Chen, Cornelia Kohler, Marilyn Keller, Shashank Tripathi, Yinghao Huang, Omid Taheri, Sai Kumar Dwivedi, Hongwei Yi, Dimitris Tzionas, Timo Bolkart, Yao Feng, and all Perceiving Systems department members for their feedback and fruitful discussions. This research was partially supported by the Max Planck ETH Center for Learning Systems. Disclosure: https://files.is.tue.mpg.de/ black/CoI/ICCV2021.txt

[1]  Jie Song,et al.  Human Body Model Fitting by Learned Gradient Descent , 2020, ECCV.

[2]  Kyoung Mu Lee,et al.  Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose , 2020, ECCV.

[3]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Lukasz Kidzinski,et al.  3D Pose Detection in Videos: Focusing on Occlusion , 2020, ArXiv.

[5]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Dragomir Anguelov,et al.  SCAPE: shape completion and animation of people , 2005, ACM Trans. Graph..

[7]  David F. Fouhey,et al.  Full-Body Awareness from Partial Observations , 2020, ECCV.

[8]  Cristian Sminchisescu,et al.  Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows , 2020, ECCV.

[9]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[10]  Andrew Zisserman,et al.  Sim2real transfer learning for 3D pose estimation: motion to the rescue , 2019, ArXiv.

[11]  Kai Oliver Arras,et al.  How Robust is 3D Human Pose Estimation to Occlusion? , 2018, ArXiv.

[12]  Zhengyi Luo,et al.  3D Human Motion Estimation via Motion Compression and Refinement , 2020, ACCV.

[13]  Ignas Budvytis,et al.  Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild , 2020, BMVC.

[14]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Andrea Vedaldi,et al.  Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation , 2020, 2021 International Conference on 3D Vision (3DV).

[16]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[19]  Liu Wu,et al.  Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Juergen Gall,et al.  A semantic occlusion model for human pose estimation from a single depth image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yu Sun,et al.  CenterHMR: a Bottom-up Single-shot Method for Multi-person 3D Mesh Recovery from a Single Image , 2020, ArXiv.

[25]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Kyoung Mu Lee,et al.  I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image , 2020, ECCV.

[28]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[29]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Kostas Daniilidis,et al.  Convolutional Mesh Regression for Single-Image Human Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yi Yang,et al.  Parsing Occluded People , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Peter V. Gehler,et al.  Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[36]  Ming-Hsuan Yang,et al.  Estimating Human Pose from Occluded Images , 2009, ACCV.

[37]  Ignas Budvytis,et al.  Indirect deep structured learning for 3D human body shape and pose prediction , 2017, BMVC.

[38]  Ziyan Wu,et al.  Hierarchical Kinematic Human Mesh Recovery , 2020, ECCV.

[39]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[40]  Dimitrios Tzionas,et al.  Monocular Expressive Body Regression through Body-Driven Attention , 2020, ECCV.

[41]  Katerina Fragkiadaki,et al.  Epipolar Transformers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Maria A. Amer,et al.  Deep 3D Human Pose Estimation Under Partial Body Presence , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[43]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Andrea Vedaldi,et al.  3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data , 2020, NeurIPS.

[47]  Bo Wang,et al.  Occlusion-Aware Networks for 3D Human Pose Estimation in Video , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yangang Wang,et al.  Object-Occluded Human Shape and Pose Estimation From a Single Color Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[51]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[52]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Yu Cheng,et al.  3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training , 2020, AAAI.

[54]  Xiaowei Zhou,et al.  Coherent Reconstruction of Multiple Humans From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Iasonas Kokkinos,et al.  HoloPose: Holistic 3D Human Reconstruction In-The-Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Ling Shao,et al.  See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Ersin Yumer,et al.  Self-supervised Learning of Motion Capture , 2017, NIPS.

[58]  Peter V. Gehler,et al.  Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Michael J. Black,et al.  The Naked Truth: Estimating Body Shape Under Clothing , 2008, ECCV.

[60]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.