Zero-shot Point Cloud Segmentation by Transferring Geometric Primitives

We investigate transductive zero-shot point cloud semantic segmentation in this paper, where unseen class labels are unavailable during training. Actually, the 3D geometric el- ements are essential cues to reason the 3D object type. If two categories share similar geometric primitives, they also have similar semantic representations. Based on this consideration, we propose a novel framework to learn the geometric prim- itives shared in seen and unseen categories’ objects, where the learned geometric primitives are served for transferring knowledge from seen to unseen categories. Specifically, a group of learnable prototypes automatically encode geomet- ric primitives via back-propagation. Then, the point visual representation is formulated as the similarity vector of its fea- ture to the prototypes, which implies semantic cues for both seen and unseen categories. Besides, considering a 3D object composed of multiple geometric primitives, we formulate the semantic representation as a mixture-distributed embedding for the fine-grained match of visual representation. In the end, to effectively learn the geometric primitives and alleviate the misclassification issue, we propose a novel Unknown-aware InfoNCE Loss to align the visual and semantic representa- tion. As a result, guided by semantic representations, the network recognizes the novel object represented with geometric primitives. Extensive experiments show that our method significantly outperforms other state-of-the-art methods in the harmonic mean-intersection-over-union (hIoU), with the improvement of 17.8%, 30.4% and 9.2% on S3DIS, ScanNet and SemanticKITTI datasets, respectively. Codes will be re-leased.

[1]  L. Petersson,et al.  Zero-Shot Learning on 3D Point Cloud Objects and Beyond , 2021, International Journal of Computer Vision.

[2]  Henghui Ding,et al.  Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Alexandre Boulch,et al.  Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds , 2021, 2021 International Conference on 3D Vision (3DV).

[4]  Shiliang Pu,et al.  RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Bingbing Liu,et al.  (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xinge Zhu,et al.  Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Siyuan Zhou,et al.  Context-aware Feature Generation For Zero-shot Semantic Segmentation , 2020, ACM Multimedia.

[8]  Fengmao Lv,et al.  Learning Unbiased Zero-Shot Semantic Segmentation Networks Via Transductive Transfer , 2020, IEEE Signal Processing Letters.

[9]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[10]  Lars Petersson,et al.  Transductive Zero-Shot Learning for 3D Point Cloud Classification , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  A. Markham,et al.  RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yunchao Wei,et al.  Consistent Structural Relation Learning for Zero-Shot Segmentation , 2020, NeurIPS.

[13]  Kate Saenko,et al.  Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation , 2020, NeurIPS.

[14]  Yansong Feng,et al.  Paraphrase Generation with Latent Bag of Words , 2020, NeurIPS.

[15]  Lars Petersson,et al.  Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects , 2019, BMVC.

[16]  Matthieu Cord,et al.  Zero-Shot Semantic Segmentation , 2019, NeurIPS.

[17]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Lars Petersson,et al.  Zero-shot Learning of 3D Point Cloud Objects , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[21]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[23]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Tao Xiang,et al.  Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yuji Matsumoto,et al.  Ridge Regression, Hubness, and Zero-Shot Learning , 2015, ECML/PKDD.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[28]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[29]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[30]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[31]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.