DyFormer : A Scalable Dynamic Graph Transformer with Provable Benefits on Generalization Ability

Transformers have achieved great success in several domains, including Natural Language Processing and Computer Vision. However, its application to real-world graphs is less explored, mainly due to its high computation cost and its poor generalizability caused by the lack of enough training data in the graph domain. To fill in this gap, we propose a scalable Transformer-like dynamic graph learning method named Dy namic Graph Trans former ( DyFormer ) with spatial-temporal encoding to effectively learn graph topology and capture implicit links. To achieve efficient and scalable training, we propose temporal-union graph structure and its associated subgraph-based node sampling strategy . To improve the generalization ability, we introduce two complementary self-supervised pre-training tasks and show that jointly optimizing the two pre-training tasks results in a smaller Bayesian error rate via an information-theoretic analysis. Extensive experiments on the real-world datasets illustrate that DyFormer achieves a consistent 1% ∼ 3% AUC gain (averaged over all time steps) compared with baselines on all benchmarks. [Code]

[1]  Kevin Swersky,et al.  Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks , 2021, 2022 IEEE International Conference on Data Mining (ICDM).

[2]  Di He,et al.  Do Transformers Really Perform Bad for Graph Representation? , 2021, ArXiv.

[3]  Ruslan Salakhutdinov,et al.  Self-supervised Learning from a Multi-view Perspective , 2020, ICLR.

[4]  Xavier Bresson,et al.  A Generalization of Transformer Networks to Graphs , 2020, ArXiv.

[5]  Davide Eynard,et al.  Temporal Graph Networks for Deep Learning on Dynamic Graphs , 2020, ArXiv.

[6]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[7]  Da Xu,et al.  Inductive Representation Learning on Temporal Graphs , 2020, ICLR.

[8]  Jiawei Zhang,et al.  Graph-Bert: Only Attention is Needed for Learning Graph Representations , 2020, ArXiv.

[9]  Zhiru Zhang,et al.  GraphZoom: A multi-level spectral approach for accurate and scalable graph embedding , 2019, ICLR.

[10]  L. Akoglu,et al.  PairNorm: Tackling Oversmoothing in GNNs , 2019, ICLR.

[11]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[12]  Charles E. Leisersen,et al.  EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs , 2019, AAAI.

[13]  Palash Goyal,et al.  dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning , 2018, Knowl. Based Syst..

[14]  Jure Leskovec,et al.  Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks , 2019, KDD.

[15]  Yixin Cao,et al.  KGAT: Knowledge Graph Attention Network for Recommendation , 2019, KDD.

[16]  Jure Leskovec,et al.  Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems , 2019, KDD.

[17]  Svetha Venkatesh,et al.  Graph Transformation Policy Network for Chemical Reaction Prediction , 2018, KDD.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Wei Zhang,et al.  Dynamic Graph Representation Learning via Self-Attention Networks , 2018, ArXiv.

[20]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[21]  Yan Liu,et al.  DynGEM: Deep Embedding Method for Dynamic Graphs , 2018, ArXiv.

[22]  Yueting Zhuang,et al.  Dynamic Network Embedding by Modeling Triadic Closure Process , 2018, AAAI.

[23]  Timothy Baldwin,et al.  Semi-supervised User Geolocation via Graph Convolutional Networks , 2018, ACL.

[24]  Xavier Bresson,et al.  Structured Sequence Modeling with Graph Convolutional Recurrent Networks , 2016, ICONIP.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Max Welling,et al.  Graph Convolutional Matrix Completion , 2017, ArXiv.

[27]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[28]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[29]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[30]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[31]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[32]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[33]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[34]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[35]  Tore Opsahl,et al.  Clustering in weighted networks , 2009, Soc. Networks.

[36]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[37]  N. Merhav,et al.  Relations Between Entropy and Error Probability , 1993, Proceedings. IEEE International Symposium on Information Theory.