论文信息 - PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models

PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models

Generalizable 3D part segmentation is important but challenging in vision and robotics. Training deep models via conventional supervised methods requires large-scale 3D datasets with fine-grained part annotations, which are costly to collect. This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP. which achieves superior performance on open-vocabulary 2D detection. We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm. We also utilize multi-view 3D priors and few-shot prompt tuning to boost performance significantly. Extensive evaluation on PartNet and PartNet-Mobility datasets shows that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to iPhone-scanned point clouds without significant domain gaps.

[1] Yunzhi Zhang,et al. IKEA-Manual: Seeing Shape Assembly Step by Step , 2023, NeurIPS.

[2] Yangyan Li,et al. Frame Mining: a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds , 2022, CoRL.

[3] C. Qi,et al. LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds , 2022, ECCV.

[4] S. Fidler,et al. MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation , 2022, ECCV.

[5] Daniel Ritchie,et al. Unsupervised Kinematic Motion Detection for Part-segmented 3D Shape Collections , 2022, SIGGRAPH.

[6] Liunian Harold Li,et al. GLIPv2: Unifying Localization and Vision-Language Understanding , 2022, 2206.05836.

[7] Mohamed Elhoseiny,et al. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies , 2022, NeurIPS.

[8] Evgeny Burnaev,et al. Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans , 2022, VISIGRAPP.

[9] Francis Engelmann,et al. Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation using Bounding Boxes , 2022, ECCV.

[10] Yung-Yu Chuang,et al. An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[12] Trevor Darrell,et al. Voxel-informed Language Grounding , 2022, ACL.

[13] Zhongang Cai,et al. AvatarCLIP , 2022, ACM Trans. Graph..

[14] Minghua Liu,et al. Approximate convex decomposition for 3D meshes with collision-aware concavity and tree search , 2022, ACM Trans. Graph..

[15] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.

[16] H. Shum,et al. Semi-supervised 3D shape segmentation with multilevel consistency and part substitution , 2022, Computational Visual Media.

[17] O. Litany,et al. Language-Grounded Indoor 3D Semantic Segmentation in the Wild , 2022, ECCV.

[18] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[19] Chuang Gan,et al. AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Xuan Thanh Nguyen,et al. SoftGroup for 3D Instance Segmentation on Point Clouds , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Pinar Yanardag,et al. Text and Image Guided 3D Avatar Generation and Manipulation , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[22] Qingyong Hu,et al. Box2Seg: Learning Semantics of 3D Point Clouds with Box-Level Supervision , 2022, ArXiv.

[23] E. Learned-Miller,et al. PriFit: Learning to Fit Primitives Improves Few Shot Point Cloud Segmentation , 2021, Comput. Graph. Forum.

[24] L. Guibas,et al. PartGlot: Learning Shape Part Segmentation from Language Reference Games , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Dongdong Chen,et al. CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Liunian Harold Li,et al. Grounded Language-Image Pre-training , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Sagie Benaim,et al. Text2Mesh: Text-Driven Neural Stylization for Meshes , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Peng Gao,et al. PointCLIP: Point Cloud Understanding by CLIP , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] P. Abbeel,et al. Zero-Shot Text-Guided Object Generation with Dream Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Jiwen Lu,et al. DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Evin Pinar Ornek,et al. 3D Compositional Zero-shot Learning with DeCompositional Consensus , 2021, ECCV.

[32] Hang Chu,et al. CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Yanyun Qu,et al. Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[34] Nikolay Jetchev,et al. ClipMatrix: Text-controlled Creation of 3D Textured Meshes , 2021, ArXiv.

[35] Tao Kong,et al. ICM-3D: Instantiated Category Modeling for 3D Instance Segmentation , 2021, IEEE Robotics and Automation Letters.

[36] Mohit Shridhar,et al. Language Grounding with 3D Objects , 2021, CoRL.

[37] Tao Mei,et al. Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud , 2021, AAAI.

[38] Xiaojuan Qi,et al. One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Pieter Abbeel,et al. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[40] Stefan Leutenegger,et al. In-Place Scene Labelling and Understanding with Implicit Scene Representation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41] Kai Xu,et al. Learning Fine-Grained Segmentation of 3D Shapes without Part Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[43] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[44] Saining Xie,et al. Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Kaichun Mo,et al. Compositionally Generalizable 3D Structure Prediction , 2020, ArXiv.

[46] Evgeny Burnaev,et al. Towards Part-Based Understanding of RGB-D Scans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Gordon Wetzstein,et al. Semantic Implicit Neural Scene Representations With Semi-Supervised Training , 2020, 2020 International Conference on 3D Vision (3DV).

[48] Bingbing Ni,et al. Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds , 2020, ECCV.

[49] Tat-Seng Chua,et al. Few-shot 3D Point Cloud Semantic Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Lingjing Wang,et al. Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Tatiana Tommasi,et al. Joint Supervised and Self-Supervised Learning for 3D Real World Challenges , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[52] Gim Hee Lee,et al. Weakly Supervised Semantic Point Cloud Segmentation: Towards 10× Fewer Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Li Jiang,et al. PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Leonidas J. Guibas,et al. SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Subhransu Maji,et al. Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions , 2020, ECCV.

[56] Kaichun Mo,et al. Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories , 2020, ICLR.

[57] Zhi Tian,et al. Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation , 2020, ECCV.

[58] Peter Wonka,et al. Point Cloud Instance Segmentation using Probabilistic Embeddings , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Wan-Yen Lo,et al. Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[60] Kaveh Hassani,et al. Unsupervised Multi-Task Feature Learning on Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61] Leonidas J. Guibas,et al. StructureNet , 2019, ACM Trans. Graph..

[62] Bo Yang,et al. Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds , 2019, NeurIPS.

[63] Leonidas J. Guibas,et al. KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64] Shiming Xiang,et al. Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65] Bernard Ghanem,et al. MortonNet: Self-Supervised Learning of Local Features in 3D Point Clouds , 2019, ArXiv.

[66] Siddhartha Chaudhuri,et al. BAE-NET: Branched Autoencoder for Shape Co-Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67] Kun Liu,et al. PartNet: A Recursive Part Decomposition Network for Fine-Grained and Hierarchical Shape Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Shu Liu,et al. Associatively Segmenting Instances and Semantics in Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Matthias Nießner,et al. 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70] L. Guibas,et al. GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71] Leonidas J. Guibas,et al. PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[73] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[74] Martin Simonovsky,et al. Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[75] Ulrich Neumann,et al. SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[77] Leonidas J. Guibas,et al. A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[78] Guillaume Obozinski,et al. Cut Pursuit: Fast Algorithms to Learn Piecewise Constant Functions , 2016, AISTATS.

[79] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[80] Stefano Caselli,et al. A 3D shape segmentation approach for robot grasping by parts , 2012, Robotics Auton. Syst..

[81] T. Popa,et al. Text to Mesh Without 3D Supervision Using Limit Subdivision , 2022, ArXiv.