Kernel Ridge Regression-Based Graph Dataset Distillation

The huge volume of emerging graph datasets has become a double-bladed sword for graph machine learning. On the one hand, it empowers the success of a myriad of graph neural networks (GNNs) with strong empirical performance. On the other hand, training modern graph neural networks on huge graph data is computationally expensive. How to distill the given graph dataset while retaining most of the trained models' performance is a challenging problem. Existing efforts try to approach this problem by solving meta-learning-based bilevel optimization objectives. A major hurdle lies in that the exact solutions of these methods are computationally intensive and thus, most, if not all, of them are solved by approximate strategies which in turn hurt the distillation performance. In this paper, inspired by the recent advances in neural network kernel methods, we adopt a kernel ridge regression-based meta-learning objective which has a feasible exact solution. However, the computation of graph neural tangent kernel is very expensive, especially in the context of dataset distillation. As a response, we design a graph kernel, named LiteGNTK, tailored for the dataset distillation problem which is closely related to the classic random walk graph kernel. An effective model named Kernel rıdge regression-based graph Dataset Distillation (KIDD) and its variants are proposed. KIDD shows nice efficiency in both the forward and backward propagation processes. At the same time, KIDD shows strong empirical performance over 7 real-world datasets compared with the state-of-the-art distillation methods. Thanks to the ability to find the exact solution of the distillation objective, the learned training graphs by KIDD can sometimes even outperform the original whole training set with as few as 1.65% training graphs.

[1]  H. Tong,et al.  Natural and Artificial Dynamics in GNNs: A Tutorial , 2023, WSDM.

[2]  Jingrui He,et al.  Natural and Artificial Dynamics in Graphs: Concept, Progress, and Future , 2022, Frontiers in Big Data.

[3]  Yu-Xiong Wang,et al.  Generalized few-shot node classification: toward an uncertainty-based solution , 2022, 2022 IEEE International Conference on Data Mining (ICDM).

[4]  Jiliang Tang,et al.  Condensing Graphs via One-Step Gradient Matching , 2022, KDD.

[5]  Hanghang Tong,et al.  Learning Optimal Propagation for Graph Neural Networks , 2022, ArXiv.

[6]  Alexei A. Efros,et al.  Dataset Distillation by Matching Training Trajectories , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Meng Jiang,et al.  Graph Data Augmentation for Graph Machine Learning: A Survey , 2022, IEEE Data Eng. Bull..

[8]  Hanghang Tong,et al.  Data Augmentation for Deep Graph Learning , 2022, SIGKDD Explor..

[9]  Hajin Shim,et al.  Graph Transplant: Node Saliency-Guided Graph Mixup with Local Structure Preservation , 2021, AAAI.

[10]  Jiliang Tang,et al.  Graph Condensation for Graph Neural Networks , 2021, International Conference on Learning Representations.

[11]  Lirong Wu,et al.  GraphMixup: Improving Class-Imbalanced Node Classification on Graphs by Self-supervised Context Prediction , 2021, ArXiv.

[12]  Zhangyang Wang,et al.  Graph Contrastive Learning Automated , 2021, ICML.

[13]  Jennifer Neville,et al.  Adversarial Graph Augmentation to Improve Graph Contrastive Learning , 2021, NeurIPS.

[14]  Hanghang Tong,et al.  Graph Sanitation with Application to Node Classification , 2021, WWW.

[15]  Xiang Zhang,et al.  GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks , 2021, WSDM.

[16]  Yusu Wang,et al.  Graph Coarsening with Neural Networks , 2021, ICLR.

[17]  Jaehoon Lee,et al.  Dataset Meta-Learning from Kernel Ridge-Regression , 2020, ICLR.

[18]  Qiang Liu,et al.  Graph Contrastive Learning with Adaptive Augmentation , 2020, WWW.

[19]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[20]  Chang-Yu Hsieh,et al.  Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models , 2020, Journal of Cheminformatics.

[21]  Kristian Kersting,et al.  TUDataset: A collection of benchmark datasets for learning with graphs , 2020, ArXiv.

[22]  Leonardo Neves,et al.  Data Augmentation for Graph Neural Networks , 2020, AAAI.

[23]  Hakan Bilen,et al.  Dataset Condensation with Gradient Matching , 2020, ICLR.

[24]  Suhang Wang,et al.  Graph Structure Learning for Robust Graph Neural Networks , 2020, KDD.

[25]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[26]  Saba A. Al-Sayouri,et al.  All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs , 2020, WSDM.

[27]  David Duvenaud,et al.  Optimizing Millions of Hyperparameters by Implicit Differentiation , 2019, AISTATS.

[28]  Ruosong Wang,et al.  Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels , 2019, NeurIPS.

[29]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[30]  Massimiliano Pontil,et al.  Learning Discrete Structures for Graph Neural Networks , 2019, ICML.

[31]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[32]  James Zou,et al.  Concrete Autoencoders for Differentiable Feature Selection and Reconstruction , 2019, ArXiv.

[33]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[34]  Byron Boots,et al.  Truncated Back-propagation for Bilevel Optimization , 2018, AISTATS.

[35]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[36]  Stephan Günnemann,et al.  Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[37]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[38]  Arthur Jacot,et al.  Neural Tangent Kernel: Convergence and Generalization in Neural Networks , 2018, NeurIPS.

[39]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[40]  Jeffrey Pennington,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[41]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[42]  Paolo Frasconi,et al.  Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.

[43]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[44]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[45]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[46]  Hyun Ah Song,et al.  FRAUDAR: Bounding Graph Fraud in the Face of Camouflage , 2016, KDD.

[47]  Hanghang Tong,et al.  QUINT: On Query-Specific Optimal Networks , 2016, KDD.

[48]  Jeff M. Phillips,et al.  Coresets and Sketches , 2016, ArXiv.

[49]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[52]  Patrice Marcotte,et al.  Bilevel programming: A survey , 2005, 4OR.

[53]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[54]  Kunihiko Fukushima,et al.  Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.

[55]  Jingrui He,et al.  HiDDen: Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection , 2017, SDM.

[56]  Franziska Abend,et al.  Facility Location Concepts Models Algorithms And Case Studies , 2016 .

[57]  Jimeng Sun,et al.  Fast Random Walk Graph Kernel , 2012, SDM.

[58]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[59]  Radford M. Neal Priors for Infinite Networks , 1996 .

[60]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..