The Surprising Power of Graph Neural Networks with Random Node Initialization

Graph neural networks (GNNs) are effective models for representation learning on graph-structured data. However, standard GNNs are limited in their expressive power, as they cannot distinguish graphs beyond the capability of the Weisfeiler-Leman (1-WL) graph isomorphism heuristic. This limitation motivated a large body of work, including higher-order GNNs, which are provably more powerful models. To date, higher-order invariant and equivariant networks are the only models with known universality results, but these results are practically hindered by prohibitive computational complexity. Thus, despite their limitations, standard GNNs are commonly used, due to their strong practical performance. In practice, GNNs have shown a promising performance when enhanced with random node initialization (RNI), where the idea is to train and run the models with randomized initial node features. In this paper, we analyze the expressive power of GNNs with RNI, and pose the following question: are GNNs with RNI more expressive than GNNs? We prove that this is indeed the case, by showing that GNNs with RNI are universal, a first such result for GNNs not relying on computationally demanding higher-order properties. We then empirically analyze the effect of RNI on GNNs, based on carefully constructed datasets. Our empirical findings support the superior performance of GNNs with RNI over standard GNNs. In fact, we demonstrate that the performance of GNNs with RNI is often comparable with or better than that of higher-order GNNs, while keeping the much lower memory requirements of standard GNNs. However, this improvement typically comes at the cost of slower model convergence. Somewhat surprisingly, we found that the convergence rate and the accuracy of the models can be improved by using only a partial random initialization regime.

[1]  Yaron Lipman,et al.  On the Universality of Invariant Networks , 2019, ICML.

[2]  Harry B. Hunt,et al.  The Complexity of Planar Counting Problems , 1998, SIAM J. Comput..

[3]  Neil Immerman,et al.  An optimal lower bound on the number of variables for graph identification , 1992, Comb..

[4]  Pablo Barceló,et al.  Logical Expressiveness of Graph Neural Networks , 2019 .

[5]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[6]  B. McKay,et al.  Fast generation of planar graphs , 2007 .

[7]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[8]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[9]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[10]  Gabriel Peyré,et al.  Universal Invariant and Equivariant Graph Neural Networks , 2019, NeurIPS.

[11]  Martin Grohe,et al.  Descriptive Complexity, Canonisation, and Definable Graph Structure Theory , 2017, Lecture Notes in Logic.

[12]  William L. Hamilton Graph Representation Learning , 2020, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[13]  Martin Grohe,et al.  The Logic of Graph Neural Networks , 2021, 2021 36th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS).

[14]  David L. Dill,et al.  Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[15]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[16]  Alex Fout,et al.  Protein Interface Prediction using Graph Convolutional Networks , 2017, NIPS.

[17]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[18]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Yaron Lipman,et al.  Invariant and Equivariant Graph Networks , 2018, ICLR.

[21]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[22]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[23]  Ludovic Dos Santos,et al.  Coloring graph neural networks for node disambiguation , 2019, IJCAI.

[24]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[25]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[26]  Pascal Schweitzer,et al.  The Weisfeiler--Leman Dimension of Planar Graphs Is at Most 3 , 2019, JACM.

[27]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[28]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[29]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[30]  Hisashi Kashima,et al.  Random Features Strengthen Graph Neural Networks , 2020, SDM.

[31]  Xavier Bresson,et al.  Benchmarking Graph Neural Networks , 2020, ArXiv.

[32]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[33]  Yaron Lipman,et al.  From Graph Low-Rank Global Attention to 2-FWL Approximation , 2020, ArXiv.

[34]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[35]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.