Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks

Graph size generalization is hard for Message passing neural networks (MPNNs). The graph-level classification performance of MPNNs degrades across various graph sizes. Recently, theoretical studies reveal that a slow uncontrollable convergence rate w.r.t. graph size could adversely affect the size generalization. To address the uncontrollable convergence rate caused by correlations across nodes in the underlying dimensional signal-generating space, we propose to use Wasserstein barycenters as graph-level consensus to combat node-level correlations. Methodologi-cally, we propose a Wasserstein barycenter matching (WBM) layer that represents an input graph by Wasserstein distances between its MPNN-filtered node embeddings versus some learned class-wise barycenters. Theoretically, we show that the convergence rate of an MPNN with a WBM layer is controllable and independent to the dimensionality of the signal-generating space. Thus MPNNs with WBM layers are less susceptible to slow un-controllable convergence rate and size variations. Empirically, the WBM layer improves the size generalization over vanilla MPNNs with different backbones (e.g., GCN, GIN, and PNA) significantly on real-world graph datasets.

[1]  Aleksandar Bojchevski,et al.  Adversarial Weight Perturbation Improves Generalization in Graph Neural Network , 2022, AAAI.

[2]  Fabio Vandin,et al.  SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks , 2022, NeurIPS.

[3]  Shuiwang Ji,et al.  GOOD: A Graph Out-of-Distribution Benchmark , 2022, NeurIPS.

[4]  N. Courty,et al.  Template based Graph Neural Network with Optimal Transport Distances , 2022, NeurIPS.

[5]  Ninghao Liu,et al.  G-Mixup: Graph Data Augmentation for Graph Classification , 2022, ICML.

[6]  R. Levie,et al.  Generalization Analysis of Message Passing Neural Networks on Large Random Graphs , 2022, NeurIPS.

[7]  Xiangnan He,et al.  Discovering Invariant Rationales for Graph Neural Networks , 2022, ICLR.

[8]  Bryan Hooi,et al.  Mixup for Node and Graph Classification , 2021, WWW.

[9]  Bruno Ribeiro,et al.  Size-Invariant Graph Representations for Graph Classification Extrapolations , 2021, ICML.

[10]  Renjie Liao,et al.  A PAC-Bayesian Approach to Generalization Bounds for Graph Neural Networks , 2020, ICLR.

[11]  Eli A. Meirom,et al.  From Local Structures to Size Generalization in Graph Neural Networks , 2020, ICML.

[12]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[13]  Kristian Kersting,et al.  TUDataset: A collection of benchmark datasets for learning with graphs , 2020, ArXiv.

[14]  Louis-Martin Rousseau,et al.  Learning TSP Requires Rethinking Generalization , 2020, ArXiv.

[15]  Leonardo Neves,et al.  Data Augmentation for Graph Neural Networks , 2020, AAAI.

[16]  Regina Barzilay,et al.  Optimal Transport Graph Neural Networks , 2020, ArXiv.

[17]  A. Bietti,et al.  Convergence and Stability of Graph Convolutional Networks on Large Random Graphs , 2020, NeurIPS.

[18]  Alejandro Ribeiro,et al.  Graphon Neural Networks and the Transferability of Graph Neural Networks , 2020, NeurIPS.

[19]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks , 2020, ICLR.

[20]  Dominique Beaini,et al.  Principal Neighbourhood Aggregation for Graph Nets , 2020, NeurIPS.

[21]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[22]  Stefanie Jegelka,et al.  Generalization and Representational Limits of Graph Neural Networks , 2020, ICML.

[23]  D. Chicco,et al.  The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation , 2020, BMC Genomics.

[24]  Thibaut Le Gouic,et al.  Fast convergence of empirical barycenters in Alexandrov spaces and the Wasserstein space , 2019, Journal of the European Mathematical Society.

[25]  Gitta Kutyniok,et al.  Transferability of Spectral Graph Convolutional Neural Networks , 2019, J. Mach. Learn. Res..

[26]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[27]  Ah Chung Tsoi,et al.  The Vapnik-Chervonenkis dimension of graph and recursive neural networks , 2018, Neural Networks.

[28]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[29]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[30]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[31]  Felipe A. Tobar,et al.  Bayesian Learning with Wasserstein Barycenters , 2018, ESAIM: Probability and Statistics.

[32]  Jing Lei Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces , 2018, Bernoulli.

[33]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[34]  Pierre Vandergheynst,et al.  Graph Signal Processing: Overview, Challenges, and Applications , 2017, Proceedings of the IEEE.

[35]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[36]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[37]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[38]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[39]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[40]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[41]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[42]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[43]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[44]  Wolfgang Heidrich,et al.  Displacement interpolation using Lagrangian mass transport , 2011, ACM Trans. Graph..

[45]  Facundo Mémoli,et al.  Gromov–Wasserstein Distances and the Metric Approach to Object Matching , 2011, Found. Comput. Math..

[46]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[47]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[48]  B. Afsari Riemannian Lp center of mass: existence, uniqueness, and convexity , 2011 .

[49]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[50]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[51]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[52]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[53]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[54]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[55]  Zachary W. Ulissi,et al.  How Do Graph Networks Generalize to Large and Diverse Molecular Systems? , 2022, ArXiv.

[56]  Nicolas Courty,et al.  POT: Python Optimal Transport , 2021, J. Mach. Learn. Res..

[57]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[58]  J. Cima,et al.  On weak* convergence in ¹ , 1996 .

[59]  P. Erdos,et al.  On the evolution of random graphs , 1984 .