Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks

We study a new paradigm of knowledge transfer that aims at encoding graph topological information into graph neural networks (GNNs) by distilling knowledge from a teacher GNN model trained on a complete graph to a student GNN model operating on a smaller or sparser graph. To this end, we revisit the connection between thermodynamics and the behavior of GNN, based on which we propose Neural Heat Kernel (NHK) to encapsulate the geometric property of the underlying manifold concerning the architecture of GNNs. A fundamental and principled solution is derived by aligning NHKs on teacher and student models, dubbed as Geometric Knowledge Distillation. We develop non- and parametric instantiations and demonstrate their efficacy in various experimental settings for knowledge distillation regarding different types of privileged topological information and teacher-student schemes.

[1]  Guihai Chen,et al.  Cross-Task Knowledge Distillation in Multi-Task Recommendation , 2022, AAAI.

[2]  Francesco Di Giovanni,et al.  Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs , 2022, NeurIPS.

[3]  Junchi Yan,et al.  Handling Distribution Shifts on Graphs: An Invariance Perspective , 2022, ICLR.

[4]  Francesco Di Giovanni,et al.  Understanding over-squashing and bottlenecks on graphs via curvature , 2021, ICLR.

[5]  Junchi Yan,et al.  GraphDE: A Generative Framework for Debiased Learning and Out-of-Distribution Detection on Graphs , 2022, NeurIPS.

[6]  Junchi Yan,et al.  NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification , 2023, NeurIPS.

[7]  S. Osher,et al.  GRAND++: Graph Neural Diffusion with A Source Term , 2022, ICLR.

[8]  Francesco Di Giovanni,et al.  Graph Neural Networks as Gradient Flows , 2022, ArXiv.

[9]  Davide Eynard,et al.  Beltrami Flow and Neural Diffusion on Graphs , 2021, NeurIPS.

[10]  Qitian Wu,et al.  Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach , 2021, NeurIPS.

[11]  Eran Treister,et al.  PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations , 2021, NeurIPS.

[12]  Carl Yang,et al.  Subgraph Federated Learning with Missing Neighbor Generation , 2021, NeurIPS.

[13]  Michael M. Bronstein,et al.  GRAND: Graph Neural Diffusion , 2021, ICML.

[14]  Le Wu,et al.  Privileged Graph Distillation for Cold Start Recommendation , 2021, SIGIR.

[15]  Joan Bruna,et al.  Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges , 2021, ArXiv.

[16]  Chuan Shi,et al.  Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework , 2021, WWW.

[17]  Zhouchen Lin,et al.  Dissecting the Diffusion Process in Linear Graph Convolutional Networks , 2021, NeurIPS.

[18]  Ken-ichi Kawarabayashi,et al.  How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks , 2020, ICLR.

[19]  Jianping Gou,et al.  Knowledge Distillation: A Survey , 2020, International Journal of Computer Vision.

[20]  Bencheng Yan,et al.  TinyGNN: Learning Efficient Graph Neural Networks , 2020, KDD.

[21]  Richard Peng,et al.  Faster Graph Embeddings via Coarsening , 2020, ICML.

[22]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[23]  D. Tao,et al.  Distilling Knowledge From Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Stefanie Jegelka,et al.  Generalization and Representational Limits of Graph Neural Networks , 2020, ICML.

[25]  Hossein Mobahi,et al.  Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.

[26]  Jiawei Zhang,et al.  Graph-Bert: Only Attention is Needed for Learning Graph Representations , 2020, ArXiv.

[27]  Yu Jin,et al.  Graph Coarsening with Preserved Spectral Properties , 2018, AISTATS.

[28]  Stephan Günnemann,et al.  Diffusion Improves Graph Learning , 2019, NeurIPS.

[29]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sangdoo Yun,et al.  A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[32]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[33]  Stephan Günnemann,et al.  Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[34]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[35]  Jangho Kim,et al.  Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[36]  Dacheng Tao,et al.  Learning from Multiple Teacher Networks , 2017, KDD.

[37]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[39]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[41]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[42]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[43]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[44]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[45]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[46]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[47]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[48]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[49]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[50]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[51]  Georgi S. Medvedev,et al.  The Nonlinear Heat Equation on Dense Graphs and Graph Limits , 2013, SIAM J. Math. Anal..

[52]  A. Grigor’yan Heat Kernel and Analysis on Manifolds , 2012 .

[53]  Lise Getoor,et al.  Query-driven Active Surveying for Collective Classification , 2012 .

[54]  Chao Liu,et al.  Recommender systems with social regularization , 2011, WSDM '11.

[55]  D. Lenz,et al.  Unbounded Laplacians on Graphs: Basic Spectral Properties and the Heat Equation , 2011, 1101.2979.

[56]  Edwin R. Hancock,et al.  Geometric characterization and clustering of graphs using heat kernel embeddings , 2010, Image Vis. Comput..

[57]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[58]  Ravi P. Agarwal,et al.  The One-Dimensional Heat Equation , 2009 .

[59]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[60]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[61]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[62]  D. Vassilevich,et al.  Heat kernel expansion: user's manual , 2003, hep-th/0306138.

[63]  Laurent Saloff-Coste,et al.  Aspects of Sobolev-type inequalities , 2001 .

[64]  A. Grigor’yan Spectral Theory and Geometry: Estimates of heat kernels on Riemannian manifolds , 1999 .

[65]  Michèle Vergne,et al.  Heat Kernels and Dirac Operators: Grundlehren 298 , 1992 .