论文信息 - Compositional Human Pose Regression

Compositional Human Pose Regression

Abstract Regression based methods are not performing as well as detection based methods for human pose estimation. A central problem is that the structural information in the pose is not well exploited in the previous regression methods. In this work, we propose a structure-aware regression approach. It adopts a reparameterized pose representation using bones instead of joints. It exploits the joint connection structure to define a compositional loss function that encodes the long range interactions in the pose. It is simple, effective, and general for both 2D and 3D pose estimation in a unified setting. Comprehensive evaluation validates the effectiveness of our approach. It establishes the new state-of-the-art on Human3.6M dataset. It is also competitive on MPII and COCO datasets.

Yichen Wei | Xiao Sun | Shuang Liang | Jiaxiang Shang

[1] Deva Ramanan,et al. 3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Jitendra Malik,et al. End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Vincent Lepetit,et al. Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Ehsan Jahangiri,et al. Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[5] Francesc Moreno-Noguer,et al. 3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Cordelia Schmid,et al. MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[7] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[9] Lourdes Agapito,et al. Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[11] Yichen Wei,et al. Integral Human Pose Regression , 2017, ECCV.

[12] Juergen Gall,et al. A Dual-Source Approach for 3D Pose Estimation from a Single Image , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yichen Wei,et al. Compositional Human Pose Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14] Song-Chun Zhu,et al. Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation , 2017, AAAI.

[15] Paul A. Bromiley,et al. Robust and Accurate Shape Model Matching Using Random Forest Regression-Voting , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Peter V. Gehler,et al. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[17] Shimon Ullman,et al. Human Pose Estimation Using Deep Consensus Voting , 2016, ECCV.

[18] Xiaowei Zhou,et al. Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Jordi Gonzàlez,et al. A coarse-to-fine approach for fast deformable object detection , 2011, CVPR 2011.

[21] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[22] Xiaogang Wang,et al. Structured Feature Learning for Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Alan L. Yuille,et al. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[24] Peter V. Gehler,et al. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Yichen Wei,et al. Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Xiu-Shen Wei,et al. Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27] Ehsan Jahangiri,et al. Generating Multiple Hypotheses for Human 3D Pose Consistent with 2D Joint Detections , 2017, ArXiv.

[28] Peter V. Gehler,et al. Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Xiaowei Zhou,et al. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Navdeep Jaitly,et al. Chained Predictions Using Convolutional Neural Networks , 2016, ECCV.

[31] Zhenhua Wang,et al. Synthesizing Training Images for Boosting Human 3D Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[32] Jitendra Malik,et al. Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34] Xiaogang Wang,et al. Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Rishabh Dabral,et al. Structure-Aware and Temporally Coherent 3D Human Pose Estimation , 2017, ArXiv.

[36] Peiyun Hu,et al. Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] James J. Little,et al. A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38] James J. Little,et al. Exploiting temporal information for 3D pose estimation , 2017, ArXiv.

[39] Hans-Peter Seidel,et al. VNect , 2017, ACM Trans. Graph..

[40] Nojun Kwak,et al. 3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information , 2016, ECCV Workshops.

[41] Michael J. Black,et al. Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Vincent Lepetit,et al. Structured Prediction of 3D Human Pose with Deep Neural Networks , 2016, BMVC.

[44] Xiaogang Wang,et al. Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[46] Thomas B. Moeslund,et al. A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[47] Jonathan Tompson,et al. Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Wei Zhang,et al. Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[49] Yichen Wei,et al. Model-Based Deep Hand Pose Estimation , 2016, IJCAI.

[50] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[51] Hwann-Tzong Chen,et al. Self Adversarial Training for Human Pose Estimation , 2017, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[52] Xiaogang Wang,et al. End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[54] Xiaowei Zhou,et al. MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Varun Ramakrishna,et al. Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[58] Xiaogang Wang,et al. Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[59] Ilya Kostrikov,et al. An Efficient Convolutional Network for Human Pose Estimation , 2016, BMVC.

[60] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[61] Georgios Tzimiropoulos,et al. Human Pose Estimation via Convolutional Part Heatmap Regression , 2016, ECCV.

[62] Peter V. Gehler,et al. Strong Appearance and Expressive Spatial Models for Human Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[63] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[64] Ioannis A. Kakadiaris,et al. 3D Human pose estimation: A review of the literature and analysis of covariates , 2016, Comput. Vis. Image Underst..

[65] Pascal Fua,et al. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[66] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[67] Stephen Lin,et al. An Integral Pose Regression System for the ECCV2018 PoseTrack Challenge , 2018, ArXiv.

[68] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[69] Bernt Schiele,et al. DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[70] J. Gower. Generalized procrustes analysis , 1975 .

[71] Antoni B. Chan,et al. Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[72] Yuandong Tian,et al. Single Image 3D Interpreter Network , 2016, ECCV.

[73] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Jonathan Tompson,et al. Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).