SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks

We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds, which is equivariant under continuous 3D roto-translations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weight-tying within the model, leading to fewer trainable parameters and thus decreased sample complexity (i.e. we need less training data). The SE(3)-Transformer leverages the benefits of self-attention to operate on large point clouds with varying number of points, while guaranteeing SE(3)-equivariance for robustness. We evaluate our model on a toy $N$-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input. We further achieve competitive performance on two real-world datasets, ScanObjectNN and QM9. In all cases, our model outperforms a strong, non-equivariant attention baseline and an equivariant model without attention.

[1]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Chao Chen,et al.  ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  G. Chirikjian,et al.  Engineering Applications of Noncommutative Harmonic Analysis: With Emphasis on Rotation and Motion Groups , 2000 .

[4]  Bingbing Ni,et al.  Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Daniel E. Worrall,et al.  Deep Scale-spaces: Equivariance Over Scale , 2019, NeurIPS.

[6]  Jie Zhou,et al.  Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  David W. Rosen,et al.  Rotation Invariant Convolutions for 3D Point Clouds Deep Learning , 2019, 2019 International Conference on 3D Vision (3DV).

[8]  Andrew Gordon Wilson,et al.  Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data , 2020, ICML.

[9]  Yifan Xu,et al.  SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters , 2018, ECCV.

[10]  Max Welling,et al.  Spherical CNNs , 2018, ICLR.

[11]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[12]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[13]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[14]  Li Li,et al.  Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[15]  Stéphane Mallat,et al.  Wavelet Scattering Regression of Quantum Chemical Energies , 2016, Multiscale Model. Simul..

[16]  Anath Fischer,et al.  3DmFV: Three-Dimensional Point Cloud Classification in Real-Time Using Convolutional Neural Networks , 2018, IEEE Robotics and Automation Letters.

[17]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[20]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[21]  C. Qi Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[22]  Cewu Lu,et al.  Pointwise Rotation-Invariant Network with Adaptive Sampling and 3D Spherical Voxel Convolution , 2020, AAAI.

[23]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[24]  Risi Kondor,et al.  Cormorant: Covariant Molecular Neural Networks , 2019, NeurIPS.

[25]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[26]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[27]  Taco Cohen,et al.  3D G-CNNs for Pulmonary Nodule Detection , 2018, ArXiv.

[28]  Sainan Liu,et al.  Attentional ShapeContextNet for Point Cloud Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Jiwen Lu,et al.  Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Maurice Weiler,et al.  General E(2)-Equivariant Steerable CNNs , 2019, NeurIPS.

[31]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[32]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Duc Thanh Nguyen,et al.  Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Max Welling,et al.  Steerable CNNs , 2016, ICLR.

[36]  Ivan Sosnovik,et al.  Scale-Equivariant Steerable Networks , 2020, ICLR.

[37]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[39]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[40]  Risi Kondor,et al.  N-body Networks: a Covariant Hierarchical Neural Network Architecture for Learning Atomic Potentials , 2018, ArXiv.

[41]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[42]  Michael A. Osborne,et al.  On the Limitations of Representing Functions on Sets , 2019, ICML.

[43]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[44]  Yedid Hoshen,et al.  VAIN: Attentional Multi-agent Predictive Modeling , 2017, NIPS.

[45]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Max Welling,et al.  3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data , 2018, NeurIPS.

[47]  Erik J. Bekkers,et al.  Attentive Group Equivariant Convolutional Networks , 2020, ICML.