3D Compositional Zero-shot Learning with DeCompositional Consensus

Parts represent a basic unit of geometric and semantic similarity across different objects. We argue that part knowledge should be composable beyond the observed object classes. Towards this, we present 3D Compositional Zero-shot Learning as a problem of part generalization from seen to unseen object classes for semantic segmentation. We provide a structured study through benchmarking the task with the proposed Compositional-PartNet dataset. This dataset is created by processing the original PartNet to maximize part overlap across different objects. The existing point cloud part segmentation methods fail to generalize to unseen object classes in this setting. As a solution, we propose DeCompositional Consensus, which combines a part segmentation network with a part scoring network. The key intuition to our approach is that a segmentation mask over some parts should have a consensus with its part scores when each part is taken apart. The two networks reason over different part combinations defined in a per-object part prior to generate the most suitable segmentation mask. We demonstrate that our method allows compositional zeroshot segmentation and generalized zero-shot classification, and establishes the state of the art on both tasks.

[1]  Federico Tombari,et al.  Learning Graph Embeddings for Compositional Zero-shot Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[5]  Martial Hebert,et al.  From Red Wine to Red Tomato: Composition with Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  E. Rolls,et al.  Neural networks in the brain involved in memory and recall , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[7]  Cewu Lu,et al.  Symmetry and Group in Attribute-Object Compositions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Thomas A. Funkhouser,et al.  A benchmark for 3D mesh segmentation , 2009, ACM Trans. Graph..

[10]  Kai Xu,et al.  Learning Fine-Grained Segmentation of 3D Shapes without Part Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  J. van Pelt,et al.  The self-organizing brain : from growth cones to functional networks ; proceedings of the 18th International Summer School of Brain Research, held at the University of Amsterdam and the Academic Medical Center (The Netherlands) from 23 to 27 August 1993 , 1994 .

[13]  Leonidas J. Guibas,et al.  A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[14]  Geoffrey E. Hinton Some Demonstrations of the Effects of Structural Descriptions in Mental Imagery , 1979, Cogn. Sci..

[15]  Davide Modolo,et al.  Do Semantic Parts Emerge in Convolutional Neural Networks? , 2016, International Journal of Computer Vision.

[16]  Lars Petersson,et al.  Zero-shot Learning of 3D Point Cloud Objects , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Dacheng Tao,et al.  Learning Unseen Concepts via Hierarchical Decomposition and Composition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[20]  Jan Kautz,et al.  Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andreas Geiger,et al.  Superquadrics Revisited: Learning 3D Shape Parsing Beyond Cuboids , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Federico Tombari,et al.  3D Point Capsule Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[24]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[25]  Zi Huang,et al.  Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches , 2020, ACM Multimedia.

[26]  Leonidas J. Guibas,et al.  Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  A Treves,et al.  Neural networks in the brain involved in memory and recall. , 1993, Progress in brain research.

[29]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Abhinav Gupta,et al.  Videos as Space-Time Region Graphs , 2018, ECCV.

[31]  Kaichun Mo,et al.  Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories , 2020, ICLR.

[32]  Gal Chechik,et al.  A causal view of compositional zero-shot recognition , 2020, NeurIPS.

[33]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[34]  Leonidas J. Guibas,et al.  PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yongqin Xian,et al.  Learning Graph Embeddings for Open World Compositional Zero-Shot Learning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Matthias Nießner,et al.  RIO: 3D Object Instance Re-Localization in Changing Indoor Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Davide Modolo,et al.  Objects as Context for Detecting Their Semantic Parts , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Siddhartha Chaudhuri,et al.  BAE-NET: Branched Autoencoder for Shape Co-Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Hao Zhang,et al.  BSP-Net: Generating Compact Meshes via Binary Space Partitioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Xiaojuan Qi,et al.  Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud , 2020, AAAI.

[42]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Lars Petersson,et al.  Transductive Zero-Shot Learning for 3D Point Cloud Classification , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[44]  Marc'Aurelio Ranzato,et al.  Task-Driven Modular Networks for Zero-Shot Compositional Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[46]  Yongqin Xian,et al.  Open World Compositional Zero-Shot Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[48]  Edward H. Adelson,et al.  Discovering states and transformations in image collections , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Donald D. Hoffman,et al.  Parts of recognition , 1984, Cognition.

[50]  Bernt Schiele,et al.  Semantic Projection Network for Zero- and Few-Label Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Tatsuya Harada,et al.  Unsupervised Pose-Aware Part Decomposition for 3D Articulated Objects , 2021, ArXiv.

[52]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[53]  Franc Solina,et al.  Segmentation and Recovery of Superquadrics , 2000, Computational Imaging and Vision.

[54]  Andrea Tagliasacchi,et al.  CvxNet: Learnable Convex Decomposition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Alexandre Boulch,et al.  Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds , 2021, 2021 International Conference on 3D Vision (3DV).

[57]  Kui Jia,et al.  3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[59]  Kristen Grauman,et al.  Attributes as Operators , 2018, ECCV.

[60]  Deva Ramanan,et al.  Learning to Move with Affordance Maps , 2020, ICLR.

[61]  Andreas Geiger,et al.  Learning Unsupervised Hierarchical Part Decomposition of 3D Objects From a Single RGB Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Lars Petersson,et al.  Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects , 2019, BMVC.

[63]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.