Invariant Graph Transformer

Rationale discovery is defined as finding a subset of the input data that maximally supports the prediction of downstream tasks. In graph machine learning context, graph rationale is defined to locate the critical subgraph in the given graph topology, which fundamentally determines the prediction results. In contrast to the rationale subgraph, the remaining subgraph is named the environment subgraph. Graph rationalization can enhance the model performance as the mapping between the graph rationale and prediction label is viewed as invariant, by assumption. To ensure the discriminative power of the extracted rationale subgraphs, a key technique named"intervention"is applied. The core idea of intervention is that given any changing environment subgraphs, the semantics from the rationale subgraph is invariant, which guarantees the correct prediction result. However, most, if not all, of the existing rationalization works on graph data develop their intervention strategies on the graph level, which is coarse-grained. In this paper, we propose well-tailored intervention strategies on graph data. Our idea is driven by the development of Transformer models, whose self-attention module provides rich interactions between input nodes. Based on the self-attention module, our proposed invariant graph Transformer (IGT) can achieve fine-grained, more specifically, node-level and virtual node-level intervention. Our comprehensive experiments involve 7 real-world datasets, and the proposed IGT shows significant performance advantages compared to 13 baseline methods.

[1]  Matt J. Kusner,et al.  Causal Machine Learning: A Survey and Open Problems , 2022, ArXiv.

[2]  Shuiwang Ji,et al.  GOOD: A Graph Out-of-Distribution Benchmark , 2022, NeurIPS.

[3]  Y. Wu,et al.  Let Invariant Rationale Discovery Inspire Graph Contrastive Learning , 2022, ICML.

[4]  D. Clifton,et al.  Multimodal Learning With Transformers: A Survey , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Meng Jiang,et al.  Graph Rationalization with Environment-based Augmentations , 2022, KDD.

[6]  Vijay Prakash Dwivedi,et al.  Recipe for a General, Powerful, Scalable Graph Transformer , 2022, NeurIPS.

[7]  Hanghang Tong,et al.  SuGeR: A Subgraph-based Graph Convolutional Network Method for Bundle Recommendation , 2022, CIKM.

[8]  Yatao Bian,et al.  Transformer for Graphs: An Overview from Architecture Perspective , 2022, ArXiv.

[9]  Wenwu Zhu,et al.  Out-Of-Distribution Generalization on Graphs: A Survey , 2022, ArXiv.

[10]  Karsten M. Borgwardt,et al.  Structure-Aware Transformer for Graph Representation Learning , 2022, ICML.

[11]  Xiangnan He,et al.  Discovering Invariant Rationales for Graph Neural Networks , 2022, ICLR.

[12]  Azalia Mirhoseini,et al.  Representing Long-Range Context for Graph Neural Networks with Global Attention , 2022, NeurIPS.

[13]  Xin Wang,et al.  OOD-GNN: Out-of-Distribution Generalized Graph Neural Network , 2021, IEEE Transactions on Knowledge and Data Engineering.

[14]  Julien Mairal,et al.  GraphiT: Encoding Graph Structure in Transformers , 2021, ArXiv.

[15]  Dominique Beaini,et al.  Rethinking Graph Transformers with Spectral Attention , 2021, NeurIPS.

[16]  Bruno Ribeiro,et al.  Size-Invariant Graph Representations for Graph Classification Extrapolations , 2021, ICML.

[17]  D. Tao,et al.  A Survey on Vision Transformer , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Masanori Koyama,et al.  Out-of-Distribution Generalization with Maximal Invariant Predictor , 2020, ArXiv.

[19]  M. Zaheer,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[20]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[21]  Tommi S. Jaakkola,et al.  Invariant Rationalization , 2020, ICML.

[22]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[23]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[26]  Christian Szegedy,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[28]  Vijay Prakash Dwivedi,et al.  Benchmarking Graph Neural Networks , 2023, J. Mach. Learn. Res..

[29]  The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 , 2022, ICLR.

[30]  Xin Wang,et al.  Learning Invariant Graph Representations for Out-of-Distribution Generalization , 2022, NeurIPS.

[31]  Tianle Cai,et al.  Do Transformers Really Perform Badly for Graph Representation? , 2021, NeurIPS.

[32]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[33]  7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 , 2019 .

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..