On the Bottleneck of Graph Neural Networks and its Practical Implications

Graph neural networks (GNNs) were shown to effectively learn from highly structured data containing elements (nodes) with relationships (edges) between them. GNN variants differ in how each node in the graph absorbs the information flowing from its neighbor nodes. In this paper, we highlight an inherent problem in GNNs: the mechanism of propagating information between neighbors creates a bottleneck when every node aggregates messages from its neighbors. This bottleneck causes the over-squashing of exponentially-growing information into fixed-size vectors. As a result, the graph fails to propagate messages flowing from distant nodes and performs poorly when the prediction task depends on long-range information. We demonstrate that the bottleneck hinders popular GNNs from fitting the training data. We show that GNNs that absorb incoming edges equally, like GCN and GIN, are more susceptible to over-squashing than other GNN types. We further show that existing, extensively-tuned, GNN-based models suffer from over-squashing and that breaking the bottleneck improves state-of-the-art results without any hyperparameter tuning or additional weights.

[1]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[2]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[3]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[4]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[5]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[6]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[7]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[8]  Pablo Barceló,et al.  Logical Expressiveness of Graph Neural Networks , 2019 .

[9]  Marc Brockschmidt,et al.  GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation , 2019, ICML.

[10]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[11]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[12]  Tingyang Xu,et al.  DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , 2020, ICLR.

[13]  Omer Levy,et al.  Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum , 2018, ACL.

[14]  Alessio Micheli,et al.  Neural Network for Graphs: A Contextual Constructive Approach , 2009, IEEE Transactions on Neural Networks.

[15]  Davide Bacciu,et al.  A Fair Comparison of Graph Neural Networks for Graph Classification , 2020, ICLR.

[16]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[17]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[18]  Leman Akoglu,et al.  PairNorm: Tackling Oversmoothing in GNNs , 2020, ICLR.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[21]  Jie Zhou,et al.  Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View , 2020, AAAI.

[22]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[24]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[25]  Le Song,et al.  Stochastic Training of Graph Convolutional Networks with Variance Reduction , 2017, ICML.

[26]  Stefanie Jegelka,et al.  Generalization and Representational Limits of Graph Neural Networks , 2020, ICML.

[27]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[28]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[29]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[30]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[31]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[32]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[33]  Taiji Suzuki,et al.  Graph Neural Networks Exponentially Lose Expressive Power for Node Classification , 2019, ICLR.

[34]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[35]  Stephan Günnemann,et al.  Predict then Propagate: Combining neural networks with personalized pagerank for classification on graphs , 2018, ICLR 2018.

[36]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[37]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[38]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.