论文信息 - SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks

SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks

We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds, which is equivariant under continuous 3D roto-translations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weight-tying within the model, leading to fewer trainable parameters and thus decreased sample complexity (i.e. we need less training data). The SE(3)-Transformer leverages the benefits of self-attention to operate on large point clouds with varying number of points, while guaranteeing SE(3)-equivariance for robustness. We evaluate our model on a toy $N$-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input. We further achieve competitive performance on two real-world datasets, ScanObjectNN and QM9. In all cases, our model outperforms a strong, non-equivariant attention baseline and an equivariant model without attention.

[1] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Chao Chen,et al. ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] G. Chirikjian,et al. Engineering Applications of Noncommutative Harmonic Analysis: With Emphasis on Rotation and Motion Groups , 2000 .

[4] Bingbing Ni,et al. Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Daniel E. Worrall,et al. Deep Scale-spaces: Equivariance Over Scale , 2019, NeurIPS.

[6] Jie Zhou,et al. Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] David W. Rosen,et al. Rotation Invariant Convolutions for 3D Point Clouds Deep Learning , 2019, 2019 International Conference on 3D Vision (3DV).

[8] Andrew Gordon Wilson,et al. Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data , 2020, ICML.

[9] Yifan Xu,et al. SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters , 2018, ECCV.

[10] Max Welling,et al. Spherical CNNs , 2018, ICLR.

[11] Jürgen Schmidhuber,et al. Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[12] Klaus-Robert Müller,et al. SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[13] Samuel S. Schoenholz,et al. Neural Message Passing for Quantum Chemistry , 2017, ICML.

[14] Li Li,et al. Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[15] Stéphane Mallat,et al. Wavelet Scattering Regression of Quantum Chemical Energies , 2016, Multiscale Model. Simul..

[16] Anath Fischer,et al. 3DmFV: Three-Dimensional Point Cloud Classification in Real-Time Using Convolutional Neural Networks , 2018, IEEE Robotics and Automation Letters.

[17] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[19] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[20] R. Zemel,et al. Neural Relational Inference for Interacting Systems , 2018, ICML.

[21] C. Qi. Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[22] Cewu Lu,et al. Pointwise Rotation-Invariant Network with Adaptive Sampling and 3D Spherical Voxel Convolution , 2020, AAAI.

[23] Pavlo O. Dral,et al. Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[24] Risi Kondor,et al. Cormorant: Covariant Molecular Neural Networks , 2019, NeurIPS.

[25] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.

[26] Alexander J. Smola,et al. Deep Sets , 2017, 1703.06114.

[27] Taco Cohen,et al. 3D G-CNNs for Pulmonary Nodule Detection , 2018, ArXiv.

[28] Sainan Liu,et al. Attentional ShapeContextNet for Point Cloud Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Jiwen Lu,et al. Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Maurice Weiler,et al. General E(2)-Equivariant Steerable CNNs , 2019, NeurIPS.

[31] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.

[32] Wei Wu,et al. PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[33] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34] Duc Thanh Nguyen,et al. Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Max Welling,et al. Steerable CNNs , 2016, ICLR.

[36] Ivan Sosnovik,et al. Scale-Equivariant Steerable Networks , 2020, ICLR.

[37] Stephan J. Garbin,et al. Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[39] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[40] Risi Kondor,et al. N-body Networks: a Covariant Hierarchical Neural Network Architecture for Learning Atomic Potentials , 2018, ArXiv.

[41] Yee Whye Teh,et al. Set Transformer , 2018, ICML.

[42] Michael A. Osborne,et al. On the Limitations of Representing Functions on Sets , 2019, ICML.

[43] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[44] Yedid Hoshen,et al. VAIN: Attentional Multi-agent Predictive Modeling , 2017, NIPS.

[45] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Max Welling,et al. 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data , 2018, NeurIPS.

[47] Erik J. Bekkers,et al. Attentive Group Equivariant Convolutional Networks , 2020, ICML.