PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models

Generalizable 3D part segmentation is important but challenging in vision and robotics. Training deep models via conventional supervised methods requires large-scale 3D datasets with fine-grained part annotations, which are costly to collect. This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP. which achieves superior performance on open-vocabulary 2D detection. We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm. We also utilize multi-view 3D priors and few-shot prompt tuning to boost performance significantly. Extensive evaluation on PartNet and PartNet-Mobility datasets shows that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to iPhone-scanned point clouds without significant domain gaps.

[1]  Yunzhi Zhang,et al.  IKEA-Manual: Seeing Shape Assembly Step by Step , 2023, NeurIPS.

[2]  Yangyan Li,et al.  Frame Mining: a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds , 2022, CoRL.

[3]  C. Qi,et al.  LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds , 2022, ECCV.

[4]  S. Fidler,et al.  MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation , 2022, ECCV.

[5]  Daniel Ritchie,et al.  Unsupervised Kinematic Motion Detection for Part-segmented 3D Shape Collections , 2022, SIGGRAPH.

[6]  Liunian Harold Li,et al.  GLIPv2: Unifying Localization and Vision-Language Understanding , 2022, 2206.05836.

[7]  Mohamed Elhoseiny,et al.  PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies , 2022, NeurIPS.

[8]  Evgeny Burnaev,et al.  Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans , 2022, VISIGRAPP.

[9]  Francis Engelmann,et al.  Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation using Bounding Boxes , 2022, ECCV.

[10]  Yung-Yu Chuang,et al.  An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[12]  Trevor Darrell,et al.  Voxel-informed Language Grounding , 2022, ACL.

[13]  Zhongang Cai,et al.  AvatarCLIP , 2022, ACM Trans. Graph..

[14]  Minghua Liu,et al.  Approximate convex decomposition for 3D meshes with collision-aware concavity and tree search , 2022, ACM Trans. Graph..

[15]  Oriol Vinyals,et al.  Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.

[16]  H. Shum,et al.  Semi-supervised 3D shape segmentation with multilevel consistency and part substitution , 2022, Computational Visual Media.

[17]  O. Litany,et al.  Language-Grounded Indoor 3D Semantic Segmentation in the Wild , 2022, ECCV.

[18]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[19]  Chuang Gan,et al.  AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xuan Thanh Nguyen,et al.  SoftGroup for 3D Instance Segmentation on Point Clouds , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Pinar Yanardag,et al.  Text and Image Guided 3D Avatar Generation and Manipulation , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[22]  Qingyong Hu,et al.  Box2Seg: Learning Semantics of 3D Point Clouds with Box-Level Supervision , 2022, ArXiv.

[23]  E. Learned-Miller,et al.  PriFit: Learning to Fit Primitives Improves Few Shot Point Cloud Segmentation , 2021, Comput. Graph. Forum.

[24]  L. Guibas,et al.  PartGlot: Learning Shape Part Segmentation from Language Reference Games , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Dongdong Chen,et al.  CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Liunian Harold Li,et al.  Grounded Language-Image Pre-training , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sagie Benaim,et al.  Text2Mesh: Text-Driven Neural Stylization for Meshes , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Peng Gao,et al.  PointCLIP: Point Cloud Understanding by CLIP , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  P. Abbeel,et al.  Zero-Shot Text-Guided Object Generation with Dream Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jiwen Lu,et al.  DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Evin Pinar Ornek,et al.  3D Compositional Zero-shot Learning with DeCompositional Consensus , 2021, ECCV.

[32]  Hang Chu,et al.  CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yanyun Qu,et al.  Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Nikolay Jetchev,et al.  ClipMatrix: Text-controlled Creation of 3D Textured Meshes , 2021, ArXiv.

[35]  Tao Kong,et al.  ICM-3D: Instantiated Category Modeling for 3D Instance Segmentation , 2021, IEEE Robotics and Automation Letters.

[36]  Mohit Shridhar,et al.  Language Grounding with 3D Objects , 2021, CoRL.

[37]  Tao Mei,et al.  Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud , 2021, AAAI.

[38]  Xiaojuan Qi,et al.  One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Pieter Abbeel,et al.  Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Stefan Leutenegger,et al.  In-Place Scene Labelling and Understanding with Implicit Scene Representation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Kai Xu,et al.  Learning Fine-Grained Segmentation of 3D Shapes without Part Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[43]  Quoc V. Le,et al.  Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[44]  Saining Xie,et al.  Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Kaichun Mo,et al.  Compositionally Generalizable 3D Structure Prediction , 2020, ArXiv.

[46]  Evgeny Burnaev,et al.  Towards Part-Based Understanding of RGB-D Scans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Gordon Wetzstein,et al.  Semantic Implicit Neural Scene Representations With Semi-Supervised Training , 2020, 2020 International Conference on 3D Vision (3DV).

[48]  Bingbing Ni,et al.  Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds , 2020, ECCV.

[49]  Tat-Seng Chua,et al.  Few-shot 3D Point Cloud Semantic Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Lingjing Wang,et al.  Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Tatiana Tommasi,et al.  Joint Supervised and Self-Supervised Learning for 3D Real World Challenges , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[52]  Gim Hee Lee,et al.  Weakly Supervised Semantic Point Cloud Segmentation: Towards 10× Fewer Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Li Jiang,et al.  PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Subhransu Maji,et al.  Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions , 2020, ECCV.

[56]  Kaichun Mo,et al.  Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories , 2020, ICLR.

[57]  Zhi Tian,et al.  Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation , 2020, ECCV.

[58]  Peter Wonka,et al.  Point Cloud Instance Segmentation using Probabilistic Embeddings , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Wan-Yen Lo,et al.  Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[60]  Kaveh Hassani,et al.  Unsupervised Multi-Task Feature Learning on Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61]  Leonidas J. Guibas,et al.  StructureNet , 2019, ACM Trans. Graph..

[62]  Bo Yang,et al.  Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds , 2019, NeurIPS.

[63]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64]  Shiming Xiang,et al.  Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Bernard Ghanem,et al.  MortonNet: Self-Supervised Learning of Local Features in 3D Point Clouds , 2019, ArXiv.

[66]  Siddhartha Chaudhuri,et al.  BAE-NET: Branched Autoencoder for Shape Co-Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67]  Kun Liu,et al.  PartNet: A Recursive Part Decomposition Network for Fine-Grained and Hierarchical Shape Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Shu Liu,et al.  Associatively Segmenting Instances and Semantics in Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Matthias Nießner,et al.  3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  L. Guibas,et al.  GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Leonidas J. Guibas,et al.  PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[73]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[74]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[75]  Ulrich Neumann,et al.  SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[77]  Leonidas J. Guibas,et al.  A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[78]  Guillaume Obozinski,et al.  Cut Pursuit: Fast Algorithms to Learn Piecewise Constant Functions , 2016, AISTATS.

[79]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[80]  Stefano Caselli,et al.  A 3D shape segmentation approach for robot grasping by parts , 2012, Robotics Auton. Syst..

[81]  T. Popa,et al.  Text to Mesh Without 3D Supervision Using Limit Subdivision , 2022, ArXiv.