Generalization in Graph Neural Networks: Improved PAC-Bayesian Bounds on Graph Diffusion

Graph neural networks are widely used tools for graph prediction tasks. Motivated by their empirical performance, prior works have developed generalization bounds for graph neural networks, which scale with graph structures in terms of the maximum degree. In this paper, we present generalization bounds that instead scale with the largest singular value of the graph neural network's feature diffusion matrix. These bounds are numerically much smaller than prior bounds for real-world graphs. We also construct a lower bound of the generalization gap that matches our upper bound asymptotically. To achieve these results, we analyze a unified model that includes prior works' settings (i.e., convolutional and message-passing networks) and new settings (i.e., graph isomorphism networks). Our key idea is to measure the stability of graph neural networks against noise perturbations using Hessians. Empirically, we find that Hessian-based measurements correlate with the observed generalization gaps of graph neural networks accurately. Optimizing noise stability properties for fine-tuning pretrained graph neural networks also improves test performance on several graph-level classification tasks.

[1]  Dongyue Li,et al.  Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees , 2022, ICML.

[2]  S. Jegelka Theory of Graph Neural Networks: Representation and Learning , 2022, ArXiv.

[3]  Pascal Mattia Esser,et al.  Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks , 2021, NeurIPS.

[4]  Dongyue Li,et al.  Improved Regularization and Robustness for Fine-tuning in Neural Networks , 2021, NeurIPS.

[5]  Dejing Dou,et al.  Noise Stability Regularization for Improving BERT Fine-tuning , 2021, NAACL.

[6]  Sanjeev Arora,et al.  Technical perspective: Why don't today's deep nets overfit to their training data? , 2021, Commun. ACM.

[7]  B. Recht,et al.  Patterns, predictions, and actions: A story about machine learning , 2021, ArXiv.

[8]  Renjie Liao,et al.  A PAC-Bayesian Approach to Generalization Bounds for Graph Neural Networks , 2020, ICLR.

[9]  Eli A. Meirom,et al.  From Local Structures to Size Generalization in Graph Neural Networks , 2020, ICML.

[10]  Ariel Kleiner,et al.  Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ICLR.

[11]  Ken-ichi Kawarabayashi,et al.  How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks , 2020, ICLR.

[12]  Constantinos Daskalakis,et al.  The complexity of constrained min-max optimization , 2020, STOC.

[13]  Marc Lelarge,et al.  Expressive Power of Invariant and Equivariant Graph Neural Networks , 2020, ICLR.

[14]  Christopher Ré,et al.  Machine Learning on Graphs: A Model and Comprehensive Taxonomy , 2020, J. Mach. Learn. Res..

[15]  Massimiliano Pontil,et al.  Distance-Based Regularisation of Deep Networks for Fine-Tuning , 2020, ICLR.

[16]  Stefanie Jegelka,et al.  Generalization and Representational Limits of Graph Neural Networks , 2020, ICML.

[17]  Joan Bruna,et al.  Can graph neural networks count substructures? , 2020, NeurIPS.

[18]  A. Micheli,et al.  A Fair Comparison of Graph Neural Networks for Graph Classification , 2019, ICLR.

[19]  Hossein Mobahi,et al.  Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.

[20]  Ken-ichi Kawarabayashi,et al.  What Can Neural Networks Reason About? , 2019, ICLR.

[21]  Ruosong Wang,et al.  Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels , 2019, NeurIPS.

[22]  J. Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2019, ICLR.

[23]  Philip M. Long,et al.  Generalization bounds for deep convolutional neural networks , 2019, ICLR.

[24]  Colin Wei,et al.  Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation , 2019, NeurIPS.

[25]  Zhi-Li Zhang,et al.  Stability and Generalization of Graph Convolutional Neural Networks , 2019, KDD.

[26]  H. Kashima,et al.  Approximation Ratios of Graph Neural Networks for Combinatorial Problems , 2019, NeurIPS.

[27]  Michael I. Jordan,et al.  A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.

[28]  Benjamin Guedj,et al.  A Primer on PAC-Bayesian Learning , 2019, ICML 2019.

[29]  Ah Chung Tsoi,et al.  The Vapnik-Chervonenkis dimension of graph and recursive neural networks , 2018, Neural Networks.

[30]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[31]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[32]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[33]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[34]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[35]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[36]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[37]  David L. Dill,et al.  Learning a SAT Solver from Single-Bit Supervision , 2018, ICLR.

[38]  Ashish Goel,et al.  Pruning based Distance Sketches with Provable Guarantees on Random Graphs , 2017, WWW.

[39]  O. Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[40]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[41]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[42]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[43]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[44]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[45]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[46]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[47]  Peter L. Bartlett,et al.  Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..

[48]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[49]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[50]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[51]  Le Song,et al.  Discriminative Embeddings of Latent Variable Models for Structured Data , 2016, ICML.

[52]  Kamesh Munagala,et al.  A Note on Modeling Retweet Cascades on Twitter , 2015, WAW.

[53]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[54]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[55]  Ashish Goel,et al.  Connectivity in Random Forests and Credit Networks , 2015, SODA.

[56]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[57]  David A. McAllester A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.

[58]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[59]  Karsten M. Borgwardt,et al.  Graph Kernels , 2008, J. Mach. Learn. Res..

[60]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[61]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.