Point Cloud Mamba: Point Cloud Learning via State Space Model

Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture in point cloud analysis. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent. Consistent Traverse Serialization yields six variants by permuting the order of x, y, and z coordinates, and the synergistic use of these variants aids Mamba in comprehensively observing point cloud data. Furthermore, to assist Mamba in handling point sequences with different orders more effectively, we introduce point prompts to inform Mamba of the sequence's arrangement rules. Finally, we propose positional encoding based on spatial coordinate mapping to inject positional information into point cloud sequences better. Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets. It is worth mentioning that when using a more powerful local feature extraction module, our PCM achieves 82.6 mIoU on S3DIS, significantly surpassing the previous SOTA models, DeLA and PTv3, by 8.5 mIoU and 7.9 mIoU, respectively. Code and model are available at https://github.com/SkyworkAI/PointCloudMamba.

[1]  K. Yan,et al.  Pan-Mamba: Effective pan-sharpening with State Space Model , 2024, ArXiv.

[2]  Ali Behrouz,et al.  Graph Mamba: Towards Learning on Graphs with State Space Models , 2024, ArXiv.

[3]  Shufan Li,et al.  Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data , 2024, ArXiv.

[4]  Jiacheng Ruan,et al.  VM-UNet: Vision Mamba UNet for Medical Image Segmentation , 2024, ArXiv.

[5]  Yijun Yang,et al.  Vivim: a Video Vision Mamba for Medical Video Object Segmentation , 2024, ArXiv.

[6]  Yijun Yang,et al.  SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation , 2024, ArXiv.

[7]  Yunjie Tian,et al.  VMamba: Visual State Space Model , 2024, ArXiv.

[8]  Bencheng Liao,et al.  Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model , 2024, ArXiv.

[9]  Jun Ma,et al.  U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation , 2024, ArXiv.

[10]  Hengshuang Zhao,et al.  Point Transformer V3: Simpler, Faster, Stronger , 2023, ArXiv.

[11]  Zhongbin Fang,et al.  Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning , 2023, ArXiv.

[12]  Albert Gu,et al.  Mamba: Linear-Time Sequence Modeling with Selective State Spaces , 2023, ArXiv.

[13]  Honghui Yang,et al.  PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm , 2023, ArXiv.

[14]  Cheng Wang,et al.  Decoupled Local Aggregation for Point Cloud Learning , 2023, ArXiv.

[15]  Kaicheng Yu,et al.  Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training , 2023, ArXiv.

[16]  Chen Change Loy,et al.  Explore In-Context Learning for 3D Point Cloud Understanding , 2023, NeurIPS.

[17]  Yi Yang,et al.  PointGPT: Auto-regressively Generative Pre-training from Point Clouds , 2023, NeurIPS.

[18]  B. Schiele,et al.  Self-Supervised Pre-Training with Masked Shape Prediction for 3D Scene Understanding , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Peng Wang,et al.  OctFormer: Octree-based Transformers for 3D Point Clouds , 2023, ACM Trans. Graph..

[20]  Chen Change Loy,et al.  Transformer-Based Visual Segmentation: A Survey , 2023, IEEE transactions on pattern analysis and machine intelligence.

[21]  B. Guo,et al.  Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding , 2023, ArXiv.

[22]  Hengshuang Zhao,et al.  Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Haotian Tang,et al.  FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xiangmin Xu,et al.  Superpoint Transformer for 3D Scene Instance Segmentation , 2022, AAAI.

[25]  Hengshuang Zhao,et al.  Point Transformer V2: Grouped Vector Attention and Partition-based Pooling , 2022, NeurIPS.

[26]  O. Litany,et al.  Mask3D: Mask Transformer for 3D Semantic Instance Segmentation , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Rao Muhammad Anwer,et al.  3D Vision with Transformers: A Survey , 2022, ArXiv.

[28]  Shenghui Cui,et al.  2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds , 2022, ECCV.

[29]  Meili Wang,et al.  Masked Autoencoders in 3D Point Cloud Representation Learning , 2022, IEEE Transactions on Multimedia.

[30]  Mohamed Elhoseiny,et al.  PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies , 2022, NeurIPS.

[31]  Chen Change Loy,et al.  Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jun Liu,et al.  Surface Representation for Point Clouds , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jiaya Jia,et al.  Stratified Transformer for 3D Point Cloud Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yong Jae Lee,et al.  Masked Discrimination for Self-Supervised Learning on Point Clouds , 2022, ECCV.

[35]  Francis E. H. Tay,et al.  Masked Autoencoders for Point Cloud Self-supervised Learning , 2022, ECCV.

[36]  Y. Fu,et al.  Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework , 2022, ICLR.

[37]  Jiwen Lu,et al.  Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Albert Gu,et al.  Efficiently Modeling Long Sequences with Structured State Spaces , 2021, ICLR.

[39]  Bernard Ghanem,et al.  ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning , 2021, NeurIPS.

[40]  Weidong Cai,et al.  Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Jianlin Su,et al.  RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.

[42]  Hengshuang Zhao,et al.  PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Rohit Girdhar,et al.  Self-Supervised Pretraining of 3D Features on any Point-Cloud , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Xiaojuan Qi,et al.  Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud , 2020, AAAI.

[45]  Shimin Hu,et al.  PCT: Point cloud transformer , 2020, Computational Visual Media.

[46]  Xinge Zhu,et al.  Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Klaus Dietmayer,et al.  Point Transformer , 2020, IEEE Access.

[48]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[49]  Leonidas J. Guibas,et al.  PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[50]  Aditya Sanghi,et al.  Info3D: Representation Learning on 3D Objects using Mutual Information Maximization and Contrastive Learning , 2020, ECCV.

[51]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[52]  Weijing Shi,et al.  Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Duc Thanh Nguyen,et al.  Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Yue Wang,et al.  Deep Closest Point: Learning Representations for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Jing Hua,et al.  A-CNN: Annularly Convolutional Neural Networks on Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Shiming Xiang,et al.  Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Alexandre Boulch ConvPoint: Continuous convolutions for point cloud processing , 2019, Comput. Graph..

[60]  Jonathan Sauder,et al.  Self-Supervised Deep Learning on Point Clouds by Reconstructing Space , 2019, NeurIPS.

[61]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Fuxin Li,et al.  PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Matthias Nießner,et al.  3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation , 2018, ECCV.

[64]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[65]  Dong Tian,et al.  Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[67]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[68]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[69]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[70]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Leonidas J. Guibas,et al.  A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[72]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[74]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  D. Hilbert Über die stetige Abbildung einer Linie auf ein Flächenstück , 1935 .