Deep Wasserstein Graph Discriminant Learning for Graph Classification

Graph topological structures are crucial to distinguish different-class graphs. In this work, we propose a deep Wasserstein graph discriminant learning (WGDL) framework to learn discriminative embeddings of graphs in Wassersteinmetric (W-metric) matching space. In order to bypass the calculation of W-metric class centers in discriminant analysis, as well as better support batch process learning, we introduce a reference set of graphs (aka graph dictionary) to express those representative graph samples (aka dictionary keys). On the bridge of graph dictionary, every input graph can be projected into the latent dictionary space through our proposed Wasserstein graph transformation (WGT). In WGT, we formulate inter-graph distance in W-metric space by virtue of the optimal transport (OT) principle, which effectively expresses the correlations of cross-graph structures. To make WGDL better representation ability, we dynamically update graph dictionary during training by maximizing the Wasserstein Discriminant loss, i.e. the ratio of inter-class versus intra-class Wasserstein distance. To evaluate our WGDL method, comprehensive experiments are conducted on six graph classification datasets. Experimental results demonstrate the effectiveness of our WGDL, and state-of-the-art performance.

[1]  Kristian Kersting,et al.  Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[2]  Nils M. Kriege,et al.  Subgraph Matching Kernels for Attributed Graphs , 2012, ICML.

[3]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nicolas Courty,et al.  Optimal Transport for structured data with application on graphs , 2018, ICML.

[5]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[6]  Jean-Luc Starck,et al.  Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning , 2017, SIAM J. Imaging Sci..

[7]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[8]  Hannu Toivonen,et al.  Statistical evaluation of the predictive toxicology challenge , 2000 .

[9]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[10]  Luc De Raedt,et al.  Graph Invariant Kernels , 2015, IJCAI.

[11]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[14]  Nicolas Courty,et al.  Wasserstein discriminant analysis , 2016, Machine Learning.

[15]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[16]  Charu C. Aggarwal,et al.  Graph Convolutional Networks with EigenPooling , 2019, KDD.

[17]  Jean-Charles Delvenne,et al.  Dynamics Based Features For Graph Classification , 2017, 1705.10817.

[18]  Ruosong Wang,et al.  Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels , 2019, NeurIPS.

[19]  José M. F. Moura,et al.  Discrete Signal Processing on Graphs , 2012, IEEE Transactions on Signal Processing.

[20]  Karsten M. Borgwardt,et al.  A Persistent Weisfeiler-Lehman Procedure for Graph Classification , 2019, ICML.

[21]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[22]  Donald F. Towsley,et al.  Diffusion-Convolutional Neural Networks , 2015, NIPS.

[23]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[25]  Jukka-Pekka Onnela,et al.  Feature-Based Classification of Networks , 2016, ArXiv.

[26]  Yijian Xiang,et al.  RetGK: Graph Kernels based on Return Probabilities of Random Walks , 2018, NeurIPS.

[27]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[28]  Ashwin Srinivasan,et al.  Statistical Evaluation of the Predictive Toxicology Challenge 2000-2001 , 2003, Bioinform..

[29]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[30]  Paolo Frasconi,et al.  Shift Aggregate Extract Networks , 2017, Front. Robot. AI.

[31]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[32]  Dorina Thanou,et al.  node2coords: Graph Representation Learning with Wasserstein Barycenters , 2020, IEEE Transactions on Signal and Information Processing over Networks.

[33]  Zhaohui Wu,et al.  Deep Learning of Graphs with Ngram Convolutional Neural Networks , 2017, IEEE Transactions on Knowledge and Data Engineering.