Continuous Surface Embeddings

In this work, we focus on the task of learning and representing dense correspondences in deformable object categories. While this problem has been considered before, solutions so far have been rather ad-hoc for specific object types (i.e., humans), often with significant manual work involved. However, scaling the geometry understanding to all objects in nature requires more automated approaches that can also express correspondences between related, but geometrically different objects. To this end, we propose a new, learnable image-based representation of dense correspondences. Our model predicts, for each pixel in a 2D image, an embedding vector of the corresponding vertex in the object mesh, therefore establishing dense correspondences between image pixels and 3D object geometry. We demonstrate that the proposed approach performs on par or better than the state-of-the-art methods for dense pose estimation for humans, while being conceptually simpler. We also collect a new in-the-wild dataset of dense correspondences for animal classes and demonstrate that our framework scales naturally to the new deformable object categories.

[1]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Daniel Cremers,et al.  The wave kernel signature: A quantum mechanical approach to shape analysis , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[4]  Björn Ommer,et al.  CliqueCNN: Deep Unsupervised Exemplar Learning , 2016, NIPS.

[5]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[6]  Yuting Zhang,et al.  Unsupervised Discovery of Object Landmarks as Structural Representations , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Yong Jae Lee,et al.  Interspecies Knowledge Transfer for Facial Keypoint Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ron Kimmel,et al.  On Bending Invariant Signatures for Surfaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ankush Gupta,et al.  Unsupervised Learning of Object Landmarks through Conditional Image Generation , 2018, NeurIPS.

[10]  Andrea Vedaldi,et al.  C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[12]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[13]  Peter Robinson,et al.  Human and sheep facial landmarks localisation by triplet interpolated features , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Guillermo Sapiro,et al.  A Gromov-Hausdorff Framework with Diffusion Geometry for Topologically-Robust Non-rigid Shape Matching , 2010, International Journal of Computer Vision.

[15]  Matthias Bethge,et al.  Using DeepLabCut for 3D markerless pose estimation across species and behaviors , 2018 .

[16]  Shubham Tulsiani,et al.  Canonical Surface Mapping via Geometric Cycle Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Mirela Ben-Chen,et al.  Deblurring and Denoising of Maps between Shapes , 2017, Comput. Graph. Forum.

[18]  Ming Jiang,et al.  Parsing R-CNN for Instance-Level Human Analysis , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  David A. Forsyth,et al.  Learning to Localize Little Landmarks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jingkuan Song,et al.  Adaptive Multi-Path Aggregation for Human DensePose Estimation in the Wild , 2019, ACM Multimedia.

[24]  Andrea Vedaldi,et al.  Transferring Dense Pose to Proximal Animal Classes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrea Vedaldi,et al.  Unsupervised Learning of Landmarks by Descriptor Vector Exchange , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Bernt Schiele,et al.  PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Michael J. Black,et al.  3D Menagerie: Modeling the 3D Shape and Pose of Animals , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[30]  Iasonas Kokkinos,et al.  Scale-invariant heat kernel signatures for non-rigid shape recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[32]  Jitendra Malik,et al.  Pose Induction for Novel Object Categories , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Tobias Gurdan Sparse Modeling of Intrinsic Correspondences , 2014 .

[34]  Björn Ommer,et al.  Deep unsupervised learning of visual similarities , 2018, Pattern Recognit..

[35]  Raif M. Rustamov,et al.  Laplace-Beltrami eigenfunctions for deformation invariant shape representation , 2007 .

[36]  Iasonas Kokkinos,et al.  Slim DensePose: Thrifty Learning From Sparse Annotations and Motion Cues , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Andrea Vedaldi,et al.  Unsupervised object learning from dense equivariant image labelling , 2017, NIPS 2017.

[38]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[39]  Jing Ren,et al.  ZoomOut: Spectral Upsampling for Efficient Shape Correspondence , 2019, ACM Trans. Graph..

[40]  Pascal Fua,et al.  DeepFly3D: A deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila , 2019, bioRxiv.

[41]  Michael J. Black,et al.  Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images “In the Wild” , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Weiyu Zhang,et al.  From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[43]  Björn Ommer,et al.  LSTM Self-Supervision for Detailed Behavior Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ron Kimmel,et al.  Generalized multidimensional scaling: A framework for isometry-invariant partial surface matching , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[46]  Weiyao Lin,et al.  Amur Tiger Re-identification in the Wild , 2019, ArXiv.

[47]  Jaakko Lehtinen,et al.  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[48]  Andrea Vedaldi,et al.  Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Björn Ommer,et al.  Unsupervised Part-Based Disentangling of Object Shape and Appearance , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Andrew W. Fitzgibbon,et al.  Creatures great and SMAL: Recovering the shape and motion of animals from video , 2018, ACCV.

[52]  Maks Ovsjanikov,et al.  Functional maps , 2012, ACM Trans. Graph..

[53]  Xavier Bresson,et al.  Functional correspondence by matrix completion , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[55]  Abhinav Gupta,et al.  Articulation-Aware Canonical Surface Mapping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[57]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.