3D Human Pose Estimation in the Wild by Adversarial Learning

Recently, remarkable advances have been achieved in 3D human pose estimation from monocular images because of the powerful Deep Convolutional Neural Networks (DCNNs). Despite their success on large-scale datasets collected in the constrained lab environment, it is difficult to obtain the 3D pose annotations for in-the-wild images. Therefore, 3D human pose estimation in the wild is still a challenge. In this paper, we propose an adversarial learning framework, which distills the 3D human pose structures learned from the fully annotated dataset to in-the-wild images with only 2D pose annotations. Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild. We also observe that a carefully designed information source for the discriminator is essential to boost the performance. Thus, we design a geometric descriptor, which computes the pairwise relative locations and distances between body joints, as a new information source for the discriminator. The efficacy of our adversarial learning framework with the new geometric descriptor has been demonstrated through extensive experiments on widely used public benchmarks. Our approach significantly improves the performance compared with previous state-of-the-art approaches.

[1]  Deva Ramanan,et al.  3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiaogang Wang,et al.  End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Antonio Torralba,et al.  Through-Wall Human Pose Estimation Using Radio Signals , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[6]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[7]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[10]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[11]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[12]  Xiaogang Wang,et al.  Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xiaogang Wang,et al.  Structured Feature Learning for Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[19]  Ming-Hsuan Yang,et al.  Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Andrew Zisserman,et al.  2D Human Pose Estimation in TV Shows , 2009, Statistical and Geometrical Approaches to Visual Motion Analysis.

[21]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jian Yang,et al.  Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[25]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Stan Sclaroff,et al.  Fast globally optimal 2D human detection with loopy graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Song-Chun Zhu,et al.  Learning Knowledge-guided Pose Grammar Machine for 3D Human Pose Estimation , 2017, ArXiv.

[28]  Abhinav Gupta,et al.  Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[29]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[30]  Xiaogang Wang,et al.  Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[32]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Francesc Moreno-Noguer,et al.  3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[36]  Yichen Wei,et al.  Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Song-Chun Zhu,et al.  Monocular 3D Human Pose Estimation by Predicting Depth on Joints , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Xiu-Shen Wei,et al.  Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[41]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[43]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[44]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Ruben Villegas,et al.  Learning to Generate Long-term Future via Hierarchical Prediction , 2017, ICML.

[47]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[48]  Chuang Gan,et al.  Recurrent Topic-Transition GAN for Visual Paragraph Generation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[50]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Mohan S. Kankanhalli,et al.  Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps , 2016, ECCV.

[53]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[54]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[55]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[56]  Vincent Lepetit,et al.  Structured Prediction of 3D Human Pose with Deep Neural Networks , 2016, BMVC.

[57]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[60]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).