Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets

Automatic discovery of category-specific 3D keypoints from a collection of objects of some category is a challenging problem. One reason is that not all objects in a category necessarily have the same semantic parts. The level of difficulty adds up further when objects are represented by 3D point clouds, with variations in shape and unknown coordinate frames. We define keypoints to be category-specific, if they meaningfully represent objects' shape and their correspondences can be simply established order-wise across all objects. This paper aims at learning category-specific 3D keypoints, in an unsupervised manner, using a collection of misaligned 3D point clouds of objects from an unknown category. In order to do so, we model shapes defined by the keypoints, within a category, using the symmetric linear basis shapes without assuming the plane of symmetry to be known. The usage of symmetry prior leads us to learn stable keypoints suitable for higher misalignments. To the best of our knowledge, this is the first work on learning such keypoints directly from 3D point clouds. Using categories from four benchmark datasets, we demonstrate the quality of our learned keypoints by quantitative and qualitative evaluations. Our experiments also show that the keypoints discovered by our method are geometrically and semantically consistent.

[1]  Yi Yang,et al.  Style Aggregated Network for Facial Landmark Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Costas Armenakis,et al.  Automatic 3D Surface Co-Registration Using Keypoint Matching , 2017 .

[4]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[6]  Deva Ramanan,et al.  Analyzing 3D Objects in Cluttered Images , 2012, NIPS.

[7]  Jiaxin Li,et al.  SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Heng Yang,et al.  In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction From 2D Landmarks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yanping Li,et al.  A Novel Fast Retina Keypoint Extraction Algorithm for Multispectral Images Using Geometric Algebra , 2019, IEEE Access.

[11]  George Trigeorgis,et al.  The 3D Menpo Facial Landmark Tracking Challenge , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[12]  Jim Austin,et al.  3D landmark model discovery from a registered set of organic shapes , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Daniel Cohen-Or,et al.  Structure-aware shape processing , 2013, Eurographics.

[14]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[15]  Nicu Sebe,et al.  Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation , 2019, ACM Multimedia.

[16]  Xiaowei Zhou,et al.  6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Leonidas J. Guibas,et al.  Multiview Aggregation for Learning Category-Specific Shape Reconstruction , 2019, NeurIPS.

[18]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[20]  Andrea Vedaldi,et al.  Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[24]  Adrien Bartoli,et al.  Isometric Non-rigid Shape-from-Motion in Linear Time , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hongdong Li,et al.  UPnP: An Optimal O(n) Solution to the Absolute Pose Problem with Universal Applicability , 2014, ECCV.

[26]  Simon Lucey,et al.  Deep Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Andrea Vedaldi,et al.  C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Takeo Kanade,et al.  Nonrigid Structure from Motion in Trajectory Space , 2008, NIPS.

[29]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[30]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kiriakos N. Kutulakos,et al.  Non-rigid structure from locally-rigid motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[34]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[35]  Hongdong Li,et al.  A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization , 2012, International Journal of Computer Vision.

[36]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[37]  Zi Jian Yew,et al.  3DFeat-Net: Weakly Supervised Local 3D Features for Point Cloud Registration , 2018, ECCV.

[38]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[39]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[41]  Jiaxin Li,et al.  USIP: Unsupervised Stable Interest Point Detection From 3D Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[43]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[44]  Francesc Moreno-Noguer,et al.  3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Edmond Boyer,et al.  FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Yuan Gao,et al.  Symmetric Non-rigid Structure from Motion for Category-Specific Object Structure Estimation , 2016, ECCV.

[47]  Michael J. Black,et al.  Dynamic FAUST: Registering Human Bodies in Motion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Feng Zhou,et al.  Deep Deformation Network for Object Landmark Localization , 2016, ECCV.

[49]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[50]  Matthew P. Reed,et al.  Modeling Body Shape from Surface Landmark Configurations , 2013, HCI.

[51]  Dacheng Tao,et al.  A Coarse-Fine Network for Keypoint Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[53]  Bernhard Egger,et al.  Morphable Face Models - An Open Framework , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[54]  Leonidas J. Guibas,et al.  A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[55]  Wen Gao,et al.  Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[57]  Jonathan Tompson,et al.  Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[58]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Olivier D. Faugeras,et al.  The fundamental matrix: Theory, algorithms, and stability analysis , 2004, International Journal of Computer Vision.