3D Medical Point Transformer: Introducing Convolution to Attention Networks for Medical Point Cloud Analysis

General point clouds have been increasingly investigated for different tasks, and recently Transformerbased networks are proposed for point cloud analysis. However, there are barely related works for medical point clouds, which are important for disease detection and treatment. In this work, we propose an attention-based model specifically for medical point clouds, namely 3D medical point Transformer (3DMedPT), to examine the complex biological structures. By augmenting contextual information and summarizing local responses at query, our attention module can capture both local context and global content feature interactions. However, the insufficient training samples of medical data may lead to poor feature learning, so we apply position embeddings to learn accurate local geometry and Multi-Graph Reasoning (MGR) to examine global knowledge propagation over channel graphs to enrich feature representations. Experiments conducted on IntrA dataset proves the superiority of 3DMedPT, where we achieve the best classification and segmentation results. Furthermore, the promising generalization ability of our method is validated on general 3D point cloud benchmarks: ModelNet40 and ShapeNetPart. Code1 is released.

[1]  Lukasz Kaiser,et al.  Rethinking Attention with Performers , 2020, ArXiv.

[2]  Yifan Xu,et al.  SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters , 2018, ECCV.

[3]  Leonidas J. Guibas,et al.  A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[4]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[6]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[7]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[8]  Weijing Shi,et al.  Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Mingqiang Wei,et al.  Adaptive Graph Convolution for Point Cloud Analysis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Irwan Bello LambdaNetworks: Modeling Long-Range Interactions Without Attention , 2021, ICLR.

[11]  Zhipeng Zhou,et al.  Geometry Sharing Network for 3D Point Cloud Classification and Segmentation , 2019, AAAI.

[12]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[13]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[14]  Lei Wang,et al.  Appendix for : Graph Attention Convolution for Point Cloud Semantic Segmentation , 2019 .

[15]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Jonathan Masci,et al.  Tractogram filtering of anatomically non-plausible fibers with geometric deep learning , 2020, MICCAI.

[20]  Alexander Kolesnikov,et al.  MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[21]  Weidong Cai,et al.  Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[23]  Yunfeng Cai,et al.  S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision , 2021, ArXiv.

[24]  Xuming He,et al.  LatentGNN: Learning Efficient Non-local Relations for Visual Recognition , 2019, ICML.

[25]  Qingquan Song,et al.  Auto-Keras: An Efficient Neural Architecture Search System , 2018, KDD.

[26]  Klaus Dietmayer,et al.  Point Transformer , 2020, IEEE Access.

[27]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[28]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[30]  Shuguang Cui,et al.  PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kurt Keutzer,et al.  Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[32]  Bostjan Likar,et al.  Vascular Surface Segmentation for Intracranial Aneurysm Isolation and Quantification , 2020, MICCAI.

[33]  Jiaxin Li,et al.  SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Jiwen Lu,et al.  DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Edward K. Wong,et al.  DeepShape: Deep-Learned Shape Descriptor for 3D Shape Retrieval , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Bernhard Preim,et al.  MedMeshCNN - Enabling MeshCNN for Medical Surface Models , 2020, Comput. Methods Programs Biomed..

[38]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[39]  Vishal M. Patel,et al.  Medical Transformer: Gated Axial-Attention for Medical Image Segmentation , 2021, MICCAI.

[40]  Bingbing Ni,et al.  MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification , 2021, ArXiv.

[41]  Hengshuang Zhao,et al.  PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Fuxin Li,et al.  PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Matthias Zwicker,et al.  Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network , 2018, AAAI.

[44]  A. Yuille,et al.  Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.

[45]  Yulan Guo,et al.  Global Context Reasoning for Semantic Segmentation of 3D Point Clouds , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[46]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[47]  Shuicheng Yan,et al.  Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Takeo Igarashi,et al.  IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Luxi Yang,et al.  ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis , 2020, ArXiv.

[51]  Zhijian Liu,et al.  Lite Transformer with Long-Short Range Attention , 2020, ICLR.

[52]  Yi Tay,et al.  Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.

[53]  Yang Wang,et al.  Learning Hybrid Representations for Automatic 3D Vessel Centerline Extraction , 2020, MICCAI.

[54]  Ulrich Neumann,et al.  Grid-GCN for Fast and Scalable Point Cloud Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Raquel Urtasun,et al.  Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Aurko Roy,et al.  Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.

[58]  Ding Liang,et al.  CycleMLP: A MLP-like Architecture for Dense Prediction , 2021, ArXiv.

[59]  Ralph R. Martin,et al.  PCT: Point cloud transformer , 2020, Computational Visual Media.

[60]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[61]  Bingbing Ni,et al.  Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Lucas Beyer,et al.  Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[63]  Daniel S. Kermany,et al.  Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning , 2018, Cell.

[64]  Shiming Xiang,et al.  Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Xiaojuan Qi,et al.  Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud , 2020, AAAI.

[66]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[67]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[68]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[69]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Chaoyi Zhang,et al.  Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).