Semi-supervised Dense Keypointsusing Unlabeled Multiview Images

This paper presents a new end-to-end semi-supervised framework to learn a dense keypoint detector using unlabeled multiview images. A key challenge lies in finding the exact correspondences between the dense keypoints in multiple views since the inverse of keypoint mapping can be neither analytically derived nor differentiated. This limits applying existing multiview supervision approaches on sparse keypoint detection that rely on the exact correspondences. To address this challenge, we derive a new probabilistic epipolar constraint that encodes the two desired properties. (1) Soft correspondence: we define a matchability, which measures a likelihood of a point matching to the other image’s corresponding point, thus relaxing the exact correspondences’ requirement. (2) Geometric consistency: every point in the continuous correspondence fields must satisfy the multiview consistency collectively. We formulate a probabilistic epipolar constraint using a weighted average of epipolar errors through the matchability thereby generalizing the point-to-point geometric error to the field-to-field geometric error. This generalization facilitates learning a geometrically coherent dense keypoint detection model by utilizing a large number of unlabeled multiview images. Additionally, to prevent degenerative cases, we employ a distillation-based regularization by using a pretrained model. Finally, we design a new neural network architecture, made of twin networks, that effectively minimizes the probabilistic epipolar errors of all possible correspondences between two view images by building affinity matrices. Our method shows superior performance compared to existing methods, including non-differentiable bootstrapping in terms of keypoint accuracy, multiview consistency, and 3D reconstruction accuracy.

[1]  Iasonas Kokkinos,et al.  Slim DensePose: Thrifty Learning From Sparse Annotations and Motion Cues , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[3]  Wanli Ouyang,et al.  3D Human Mesh Regression With Dense Correspondence , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Zhenan Sun,et al.  Learning 3D Human Shape and Pose From Dense Body Parts , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Iasonas Kokkinos,et al.  Dense Pose Transfer , 2018, ECCV.

[6]  Jihun Yu,et al.  HUMBI: A Large Multiview Dataset of Human Body Expressions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Wenjun Zeng,et al.  VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment , 2020, ECCV.

[8]  Katerina Fragkiadaki,et al.  Epipolar Transformers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Dong Liu,et al.  Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates , 2020, ArXiv.

[10]  Michael J. Black,et al.  MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[11]  Qi-Xing Huang,et al.  Dense Human Body Correspondences Using Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  B. S. Manjunath,et al.  Weakly Supervised Manifold Learning for Dense Semantic Object Correspondence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[14]  Andrea Vedaldi,et al.  Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Florian Schroff,et al.  View-Invariant Probabilistic Embedding for Human Pose , 2020, ECCV.

[17]  Kostas Daniilidis,et al.  Convolutional Mesh Regression for Single-Image Human Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Victor Lempitsky,et al.  Learnable Triangulation of Human Pose , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Takeo Kanade,et al.  Panoptic Studio: A Massively Multiview System for Social Interaction Capture , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Cheng Li,et al.  Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Andrew W. Fitzgibbon,et al.  Metric Regression Forests for Correspondence Estimation , 2015, International Journal of Computer Vision.

[25]  Iasonas Kokkinos,et al.  DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yasamin Jafarian,et al.  Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wan-Yen Lo,et al.  Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[28]  Simon Lucey,et al.  Dense Semantic Correspondence Where Every Pixel is a Classifier , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Yuan Yao,et al.  MONET: Multiview Semi-Supervised Keypoint Detection via Epipolar Divergence , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Pascal Fua,et al.  Learning Monocular 3D Human Pose Estimation from Multi-view Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Kostas Daniilidis,et al.  TexturePose: Supervising Human Mesh Estimation With Texture Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Emre Akbas,et al.  Self-Supervised Learning of 3D Human Pose Using Multi-View Geometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[36]  Marcus A. Magnor,et al.  Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Chen Qian,et al.  Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yaser Sheikh,et al.  Capture Dense: Markerless Motion Capture Meets Dense Pose Estimation , 2018, ArXiv.

[41]  Yizhou Wang,et al.  MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Abhishek Sharma,et al.  Multiview-Consistent Semi-Supervised Learning for 3D Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Andrea Vedaldi,et al.  Unsupervised Learning of Landmarks by Descriptor Vector Exchange , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[45]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Wenjun Zeng,et al.  AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild , 2020, International Journal of Computer Vision.

[47]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Wenjun Zeng,et al.  Cross View Fusion for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Pavlo Molchanov,et al.  Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Pascal Fua,et al.  Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation , 2018, ECCV.

[51]  Iasonas Kokkinos,et al.  HoloPose: Holistic 3D Human Reconstruction In-The-Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Seng Bum Michael Yoo,et al.  Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio , 2020, Nature Communications.

[53]  Andrea Vedaldi,et al.  Unsupervised learning of object frames by dense equivariant image labelling , 2017, NIPS.

[54]  Song-Chun Zhu,et al.  DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Andrea Vedaldi,et al.  Transferring Dense Pose to Proximal Animal Classes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).