DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction.

[1]  Junchi Yan,et al.  Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs , 2022, ArXiv.

[2]  Junchi Yan,et al.  Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks , 2022, NeurIPS.

[3]  Junchi Yan,et al.  ScaleGCN: Efficient and Effective Graph Convolution via Channel-Wise Scale Transformation. , 2022, IEEE transactions on neural networks and learning systems.

[4]  Junchi Yan,et al.  Variational Inference for Training Graph Neural Networks in Low-Data Regime through Joint Structure-Label Estimation , 2022, KDD.

[5]  Jinsong Chen,et al.  NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs , 2022, ICLR.

[6]  Junchi Yan,et al.  Handling Distribution Shifts on Graphs: An Invariance Perspective , 2022, ICLR.

[7]  S. Osher,et al.  GRAND++: Graph Neural Diffusion with A Source Term , 2022, ICLR.

[8]  Junchi Yan,et al.  NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification , 2023, NeurIPS.

[9]  Yuelin Wang ACMP: Allen-Cahn Message Passing with Attractive and Repulsive Forces for Graph Neural Networks , 2022 .

[10]  Davide Eynard,et al.  Beltrami Flow and Neural Diffusion on Graphs , 2021, NeurIPS.

[11]  Eran Treister,et al.  PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations , 2021, NeurIPS.

[12]  Michael M. Bronstein,et al.  GRAND: Graph Neural Diffusion , 2021, ICML.

[13]  Di He,et al.  Do Transformers Really Perform Bad for Graph Representation? , 2021, ArXiv.

[14]  Rik Sarkar,et al.  PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models , 2021, CIKM.

[15]  Zheng Zhang,et al.  Graph Neural Networks Inspired by Classical Iterative Algorithms , 2021, ICML.

[16]  Zhouchen Lin,et al.  Dissecting the Diffusion Process in Linear Graph Convolutional Networks , 2021, NeurIPS.

[17]  Seyed Mehran Kazemi,et al.  SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks , 2021, NeurIPS.

[18]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[19]  Jiliang Tang,et al.  A Unified View on Graph Neural Networks as Graph Signal Denoising , 2020, CIKM.

[20]  Lucy J. Colwell,et al.  Rethinking Attention with Performers , 2020, ICLR.

[21]  Quoc V. Le,et al.  Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xavier Bresson,et al.  A Generalization of Transformer Networks to Graphs , 2020, ArXiv.

[23]  M. Zaheer,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[24]  Yaliang Li,et al.  Simple and Deep Graph Convolutional Networks , 2020, ICML.

[25]  Aleksandar Bojchevski,et al.  Scaling Graph Neural Networks with Approximate PageRank , 2020, KDD.

[26]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[27]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[28]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[29]  Jiawei Zhang,et al.  Graph-Bert: Only Attention is Needed for Learning Graph Representations , 2020, ArXiv.

[30]  Yu Chen,et al.  Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings , 2019, NeurIPS.

[31]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[32]  Stephan Günnemann,et al.  Diffusion Improves Graph Learning , 2019, NeurIPS.

[33]  Huawei Shen,et al.  Graph Convolutional Networks using Heat Kernel for Semi-supervised Learning , 2019, IJCAI.

[34]  Bin Luo,et al.  Semi-Supervised Learning With Graph Learning-Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[36]  Massimiliano Pontil,et al.  Learning Discrete Structures for Graph Neural Networks , 2019, ICML.

[37]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[38]  Mark Coates,et al.  Bayesian graph convolutional neural networks for semi-supervised classification , 2018, AAAI.

[39]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[40]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[41]  Emmanuel Müller,et al.  NetLSD: Hearing the Shape of a Graph , 2018, KDD.

[42]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[43]  Georgios B. Giannakis,et al.  Kernel-based Inference of Functions over Graphs , 2017, ArXiv.

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[46]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[47]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[48]  Donald F. Towsley,et al.  Diffusion-Convolutional Neural Networks , 2015, NIPS.

[49]  Georgi S. Medvedev,et al.  The Nonlinear Heat Equation on Dense Graphs and Graph Limits , 2013, SIAM J. Math. Anal..

[50]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[51]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[52]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[53]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[54]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[55]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[56]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[57]  Dimitrios I. Fotiadis,et al.  Artificial neural networks for solving ordinary and partial differential equations , 1997, IEEE Trans. Neural Networks.

[58]  S. Rosenberg The Laplacian on a Riemannian Manifold: An Introduction to Analysis on Manifolds , 1997 .

[59]  Mark Freidlin,et al.  Diffusion Processes on Graphs and the Averaging Principle , 1993 .