Distilling Knowledge From Graph Convolutional Networks

Existing knowledge distillation methods focus on convolutional neural networks (CNNs), where the input samples like images lie in a grid domain, and have largely overlooked graph convolutional networks (GCN) that handle non-grid data. In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a pre-trained GCN model. To enable the knowledge transfer from the teacher GCN to the student, we propose a local structure preserving module that explicitly accounts for the topological semantics of the teacher. In this module, the local structure information from both the teacher and the student are extracted as distributions, and hence minimizing the distance between these distributions enables topology-aware knowledge transfer from the teacher, yielding a compact yet high-performance student model. Moreover, the proposed approach is readily extendable to dynamic graph models, where the input graphs for the teacher and the student may differ. We evaluate the proposed method on two different datasets using GCN models of different architectures, and demonstrate that our method achieves the state-of-the-art knowledge distillation performance for GCN models.

[1]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[3]  Mingli Song,et al.  Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lothar Thiele,et al.  Multi-Task Zipping via Layer-wise Neuron Sharing , 2018, NeurIPS.

[5]  Junzhou Huang,et al.  Adaptive Sampling Towards Fast Graph Representation Learning , 2018, NeurIPS.

[6]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[7]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[8]  Soummya Kar,et al.  Topology adaptive graph convolutional networks , 2017, ArXiv.

[9]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[10]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[11]  Qi Tian,et al.  Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[14]  Meng Wang,et al.  3D deep shape descriptor , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Zhidong Deng,et al.  Accelerating Convolutional Neural Networks with Dominant Convolutional Kernel and Knowledge Pre-regression , 2016, ECCV.

[17]  Yue Gao,et al.  Hypergraph Neural Networks , 2018, AAAI.

[18]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wei Wu,et al.  PointCNN: convolution on Χ -transformed points , 2018, NIPS 2018.

[20]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[21]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dacheng Tao,et al.  SPAGAN: Shortest Path Graph Attention Network , 2019, IJCAI.

[23]  Iasonas Kokkinos,et al.  Scale-invariant heat kernel signatures for non-rigid shape recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Leonidas J. Guibas,et al.  FPNN: Field Probing Neural Networks for 3D Data , 2016, NIPS.

[25]  Daniel Cremers,et al.  The wave kernel signature: A quantum mechanical approach to shape analysis , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[26]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[27]  Heinrich Müller,et al.  SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Jian Peng,et al.  Knowledge Flow: Improve Upon Your Teachers , 2019, ICLR.

[29]  Li Sun,et al.  Amalgamating Knowledge towards Comprehensive Classification , 2018, AAAI.

[30]  Yong Ge,et al.  Binarized Collaborative Filtering with Distilling Graph Convolutional Networks , 2019, IJCAI.

[31]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[34]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[35]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[36]  Mingli Song,et al.  Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning , 2019, IJCAI.

[37]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[38]  Naiyan Wang,et al.  Like What You Like: Knowledge Distill via Neuron Selectivity Transfer , 2017, ArXiv.

[39]  Mingli Song,et al.  Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers , 2019, IJCAI.

[40]  Li Sun,et al.  Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Chao Xu,et al.  Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Xiaowu Chen,et al.  3D Mesh Labeling via Deep Convolutional Neural Networks , 2015, ACM Trans. Graph..

[43]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[44]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[45]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[46]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[48]  Bing Li,et al.  Knowledge Distillation via Instance Relationship Graph , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).