Learning Unsupervised Hierarchical Part Decomposition of 3D Objects From a Single RGB Image

Humans perceive the 3D world as a set of distinct objects that are characterized by various low-level (geometry, reflectance) and high-level (connectivity, adjacency, symmetry) properties. Recent methods based on convolutional neural networks (CNNs) demonstrated impressive progress in 3D reconstruction, even when using a single 2D image as input. However, the majority of these methods focuses on recovering the local 3D geometry of an object without considering its part-based decomposition or relations between parts. We address this challenging problem by proposing a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives as well as their latent hierarchical structure without part-level supervision. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives, where simple parts are represented with fewer primitives and more complex parts are modeled with more components. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.

[1]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[3]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Jun Li,et al.  Symmetry Hierarchy of Man‐Made Objects , 2011, Comput. Graph. Forum.

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Michael J. Black,et al.  Dynamic FAUST: Registering Human Bodies in Motion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Anders P. Eriksson,et al.  Implicit Surface Representations As Layers in Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Luc Van Gool,et al.  RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[10]  Irving Biederman,et al.  Human image understanding: Recent research and a theory , 1985, Comput. Vis. Graph. Image Process..

[11]  Mathieu Aubry,et al.  Learning elementary structures for 3D shape generation and matching , 2019, NeurIPS.

[12]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[13]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  David H. Laidlaw,et al.  Constructive solid geometry for polyhedral objects , 1986, SIGGRAPH.

[15]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jun Li,et al.  Im2Struct: Recovering 3D Shape Structure from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  R. Baillargeon Infants' Physical World , 2004 .

[18]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[20]  Andrea Tagliasacchi,et al.  CvxNet: Learnable Convex Decomposition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jiajun Wu,et al.  Learning to Infer and Execute 3D Shape Programs , 2019, ICLR.

[23]  Katherine D. Kinzler,et al.  Core systems in human cognition. , 2007, Progress in brain research.

[24]  Leonidas J. Guibas,et al.  Supervised Fitting of Geometric Primitives to 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ersin Yumer,et al.  3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Shengping Zhang,et al.  Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Daniel Cohen-Or,et al.  Structure-aware shape processing , 2013, Eurographics.

[29]  Armando Solar-Lezama,et al.  Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[30]  Leonidas J. Guibas,et al.  StructureNet , 2019, ACM Trans. Graph..

[31]  Subhransu Maji,et al.  3D Shape Induction from 2D Views of Multiple Objects , 2016, 2017 International Conference on 3D Vision (3DV).

[32]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[34]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[35]  Thomas A. Funkhouser,et al.  Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  William Rowan Hamilton,et al.  ON QUATERNIONS, OR ON A NEW SYSTEM OF IMAGINARIES IN ALGEBRA , 1847 .

[37]  Chen Sun,et al.  Unsupervised Discovery of Parts, Structure, and Dynamics , 2019, ICLR.

[38]  Donald D. Hoffman,et al.  Parts of recognition , 1984, Cognition.

[39]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[40]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[41]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[42]  Yiyi Liao,et al.  Deep Marching Cubes: Learning Explicit Surface Representations , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[44]  Andreas Geiger,et al.  Learning Non-Volumetric Depth Fusion Using Successive Reprojections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[46]  Xiaoguang Han,et al.  Deep Mesh Reconstruction From Single RGB Images via Topology Modification Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Fangqiao Hu,et al.  Learning Structural Graph Layouts and 3D Shapes for Long Span Bridges 3D Reconstruction , 2019, ArXiv.

[48]  Mathieu Aubry,et al.  AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Xiaojuan Qi,et al.  GAL: Geometric Adversarial Loss for Single-View 3D-Object Reconstruction , 2018, ECCV.

[51]  Leonidas J. Guibas,et al.  GRASS: Generative Recursive Autoencoders for Shape Structures , 2017, ACM Trans. Graph..

[52]  Luc Van Gool,et al.  Learned Multi-patch Similarity , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[54]  Jiajun Wu,et al.  Learning to Describe Scenes with Programs , 2018, ICLR.

[55]  Andreas Geiger,et al.  Superquadrics Revisited: Learning 3D Shape Parsing Beyond Cuboids , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Leonidas J. Guibas,et al.  Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Duygu Ceylan,et al.  DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[58]  Andreas Geiger,et al.  Learning 3D Shape Completion from Laser Scan Data with Weak Supervision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  William Rowan Hamilton,et al.  XI. On quaternions; or on a new system of imaginaries in algebra , 1848 .

[60]  Subhransu Maji,et al.  CSGNet: Neural Shape Parser for Constructive Solid Geometry , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).