Compositionally Generalizable 3D Structure Prediction

Single-image 3D shape reconstruction is an important and long-standing problem in computer vision. A plethora of existing works is constantly pushing the state-of-the-art performance in the deep learning era. However, there remains a much difficult and largely under-explored issue on how to generalize the learned skills over novel unseen object categories that have very different shape geometry distribution. In this paper, we bring in the concept of compositional generalizability and propose a novel framework that factorizes the 3D shape reconstruction problem into proper sub-problems, each of which is tackled by a carefully designed neural sub-module with generalizability guarantee. The intuition behind our formulation is that object parts (slates and cylindrical parts), their relationships (adjacency, equal-length, and parallelism) and shape substructures (T-junctions and a symmetric group of parts) are mostly shared across object categories, even though the object geometry may look very different (chairs and cabinets). Experiments on PartNet show that we achieve superior performance than baseline methods, which validates our problem factorization and network designs.

[1]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[2]  Prakhar Jaiswal,et al.  Assembly-based conceptual 3D modeling with unlabeled components using probabilistic factor graph , 2016, Comput. Aided Des..

[3]  Jiajun Wu,et al.  Learning to Infer and Execute 3D Shape Programs , 2019, ICLR.

[4]  Andreas Geiger,et al.  Learning Unsupervised Hierarchical Part Decomposition of 3D Objects From a Single RGB Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Daniel Cohen-Or,et al.  CompoNet: Learning to Generate the Unseen by Part Synthesis and Composition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Yang Liu,et al.  Adaptive O-CNN: A Patch-based Deep Representation of 3D Shapes , 2018 .

[7]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Baoquan Chen,et al.  Generative 3D Part Assembly via Dynamic Graph Learning , 2020, NeurIPS.

[10]  Daniel Cohen-Or,et al.  Non-homogeneous resizing of complex models , 2008, SIGGRAPH Asia '08.

[11]  Thomas A. Funkhouser,et al.  Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Leonidas J. Guibas,et al.  Probabilistic reasoning for assembly-based 3D modeling , 2011, ACM Trans. Graph..

[13]  Marco Baroni,et al.  Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks , 2018, BlackboxNLP@EMNLP.

[14]  Leonidas J. Guibas,et al.  DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces , 2020, ACM Trans. Graph..

[15]  Stefan Roth,et al.  Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Oliver Brock,et al.  Manipulating articulated objects with interactive perception , 2008, 2008 IEEE International Conference on Robotics and Automation.

[17]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Mathieu Aubry,et al.  AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[19]  Theodor Zingg,et al.  Beitrag zur Schotteranalyse , 1935 .

[20]  Leonidas J. Guibas,et al.  GRASS: Generative Recursive Autoencoders for Shape Structures , 2017, ACM Trans. Graph..

[21]  Thomas Funkhouser,et al.  Local Deep Implicit Functions for 3D Shape , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dani Lischinski,et al.  SAGNet , 2018, ACM Trans. Graph..

[23]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jun Li,et al.  Im2Struct: Recovering 3D Shape Structure from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Anders P. Eriksson,et al.  Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors , 2020, ECCV.

[26]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[27]  Hao Zhang,et al.  PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jiayuan Gu,et al.  Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals , 2020, NeurIPS.

[29]  Andreas Geiger,et al.  Superquadrics Revisited: Learning 3D Shape Parsing Beyond Cuboids , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Leonidas J. Guibas,et al.  Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jiajun Wu,et al.  Learning to Reconstruct Shapes from Unseen Classes , 2018, NeurIPS.

[32]  Leonidas J. Guibas,et al.  StructEdit: Learning Structural Shape Variations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Siddhartha Chaudhuri,et al.  A probabilistic model for component-based shape synthesis , 2012, ACM Trans. Graph..

[34]  M. V. D. Panne,et al.  Joint-aware manipulation of deformable models , 2009, SIGGRAPH 2009.

[35]  Chen Kong,et al.  Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[36]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[39]  Duygu Ceylan,et al.  DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[40]  Andrea Tagliasacchi,et al.  CvxNet: Learnable Convex Decomposition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Zhaoyuan Fang,et al.  GSIR: Generalizable 3D Shape Interpretation and Reconstruction , 2020, ECCV.

[42]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43]  Yinda Zhang,et al.  Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[45]  Leonidas J. Guibas,et al.  PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Qian-Fang Zou,et al.  Learning adaptive hierarchical cuboid abstractions of 3D shape collections , 2019, ACM Trans. Graph..

[47]  Oliver Brock,et al.  The RBO dataset of articulated objects and interactions , 2018, Int. J. Robotics Res..

[48]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[49]  Alexey Dosovitskiy,et al.  Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[50]  Bharath Hariharan,et al.  Few-Shot Generalization for Single-Image 3D Reconstruction via Priors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Leonidas J. Guibas,et al.  Composite Shape Modeling via Latent Space Factorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Leonidas J. Guibas,et al.  StructureNet , 2019, ACM Trans. Graph..

[53]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Lin Gao SDM-NET : Deep Generative Network for Structured Deformable Mesh , 2019 .

[56]  Yichen Li,et al.  Learning 3D Part Assembly from a Single Image , 2020, ECCV.

[57]  Kai Xu,et al.  Learning Part Generation and Assembly for Structure-aware Shape Synthesis , 2019, AAAI.

[58]  Niloy J. Mitra,et al.  ShapeAssembly , 2020, ACM Trans. Graph..

[59]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).