Very Deep Graph Neural Networks Via Noise Regularisation

Graph Neural Networks (GNNs) perform learned message passing over an input graph, but conventional wisdom says performing more than handful of steps makes training difficult and does not yield improved performance. Here we show the contrary. We train a deep GNN with up to 100 message passing steps and achieve several state-of-the-art results on two challenging molecular property prediction benchmarks, Open Catalyst 2020 IS2RE and QM9. Our approach depends crucially on a novel but simple regularisation method, which we call “Noisy Nodes”, in which we corrupt the input graph with noise and add an auxiliary node autoencoder loss if the task is graph property prediction. Our results show this regularisation method allows the model to monotonically improve in performance with increased message passing steps. Our work opens new opportunities for reaping the benefits of deep neural networks in the space of graph and other structured prediction problems.

[1]  Eran Yahav,et al.  On the Bottleneck of Graph Neural Networks and its Practical Implications , 2021, ICLR.

[2]  Fabian B. Fuchs,et al.  SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks , 2020, NeurIPS.

[3]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[4]  Risi Kondor,et al.  Cormorant: Covariant Molecular Neural Networks , 2019, NeurIPS.

[5]  Markus Meuwly,et al.  PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. , 2019, Journal of chemical theory and computation.

[6]  Jure Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2020, ICLR.

[7]  Chen Cai,et al.  A Note on Over-Smoothing for Graph Neural Networks , 2020, ArXiv.

[8]  Jure Leskovec,et al.  Learning to Simulate Complex Physics with Graph Networks , 2020, ICML.

[9]  Li Li,et al.  Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[10]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Frank Noé,et al.  Equivariant Flows: sampling configurations for multi-body systems with symmetric energies , 2019, ArXiv.

[13]  Shuiwang Ji,et al.  Spherical Message Passing for 3D Graph Networks , 2021, ArXiv.

[14]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[15]  Bernard Ghanem,et al.  DeeperGCN: All You Need to Train Deeper GCNs , 2020, ArXiv.

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[18]  Christopher M. Bishop,et al.  Training with Noise is Equivalent to Tikhonov Regularization , 1995, Neural Computation.

[19]  Mordechai Kornbluth,et al.  SE(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials , 2021, ArXiv.

[20]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[21]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[22]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[23]  C. Lawrence Zitnick,et al.  The Open Catalyst 2020 (OC20) Dataset and Community Challenges , 2020, Proceedings of the International Conference on Electrocatalysis for Energy Applications and Sustainable Chemicals.

[24]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[27]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[29]  Meire Fortunato,et al.  Learning Mesh-Based Simulation with Graph Networks , 2020, ArXiv.

[30]  Michal Valko,et al.  Bootstrapped Representation Learning on Graphs , 2021, ArXiv.

[31]  Bernard Ghanem,et al.  DeepGCNs: Can GCNs Go As Deep As CNNs? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Wenbing Huang,et al.  The Truly Deep Graph Convolutional Networks for Node Classification , 2019, ArXiv.

[33]  Jiashi Feng,et al.  Effective Training Strategies for Deep Graph Neural Networks , 2020, ArXiv.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Pushmeet Kohli,et al.  Unveiling the predictive power of static structure in glassy systems , 2020 .

[36]  Johannes Klicpera,et al.  Fast and Uncertainty-Aware Directional Message Passing for Non-Equilibrium Molecules , 2020, ArXiv.

[37]  Max Welling,et al.  E(n) Equivariant Graph Neural Networks , 2021, ICML.

[38]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[39]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[40]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[41]  Risi Kondor,et al.  Covariant Compositional Networks For Learning Graphs , 2018, ICLR.

[42]  Shuochao Yao,et al.  Revisiting "Over-smoothing" in Deep GCNs , 2020, ArXiv.

[43]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[44]  Stephan Günnemann,et al.  Directional Message Passing for Molecular Graphs , 2020, ICLR.

[45]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[46]  Jie Zhou,et al.  Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View , 2020, AAAI.

[47]  Raia Hadsell,et al.  Graph networks as learnable physics engines for inference and control , 2018, ICML.

[48]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[49]  Marc Brockschmidt,et al.  GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation , 2019, ICML.

[50]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[51]  Jure Leskovec,et al.  ForceNet: A Graph Neural Network for Large-Scale Quantum Calculations , 2021, ArXiv.

[52]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[53]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[54]  Leman Akoglu,et al.  PairNorm: Tackling Oversmoothing in GNNs , 2020, ICLR.