node2coords: Graph Representation Learning with Wasserstein Barycenters

In order to perform network analysis tasks, representations that capture the most relevant information in the graph structure are needed. However, existing methods learn representations that cannot be interpreted in a straightforward way and that are relatively unstable to perturbations of the graph structure. We address these two limitations by proposing node2coords, a representation learning algorithm for graphs, which learns simultaneously a low-dimensional space and coordinates for the nodes in that space. The patterns that span the low dimensional space reveal the graph's most important structural information. The coordinates of the nodes reveal the proximity of their local structure to the graph structural patterns. We measure this proximity with Wasserstein distances that permit to take into account the properties of the underlying graph. Therefore, we introduce an autoencoder that employs a linear layer in the encoder and a novel Wasserstein barycentric layer at the decoder. Node connectivity descriptors, which capture the local structure of the nodes, are passed through the encoder to learn a small set of graph structural patterns. In the decoder, the node connectivity descriptors are reconstructed as Wasserstein barycenters of the graph structural patterns. The optimal weights for the barycenter representation of a node's connectivity descriptor correspond to the coordinates of that node in the low-dimensional space. Experimental results demonstrate that the representations learned with node2coords are interpretable, lead to node embeddings that are stable to perturbations of the graph structure and achieve competitive or superior results compared to state-of-the-art unsupervised methods in node classification.

[1]  Pascal Frossard,et al.  Graph Signal Representation with Wasserstein Barycenters , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[3]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[4]  Yu Cheng,et al.  Graph Optimal Transport for Cross-Domain Alignment , 2020, ICML.

[5]  Darina Dvinskikh,et al.  On the Complexity of Approximating Wasserstein Barycenters , 2019, ICML.

[6]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[7]  Arthur Cayley,et al.  The Collected Mathematical Papers: On Monge's “Mémoire sur la théorie des déblais et des remblais” , 2009 .

[8]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[9]  Donald Goldfarb,et al.  A practicable steepest-edge simplex algorithm , 1977, Math. Program..

[10]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[11]  C. Villani Optimal Transport: Old and New , 2008 .

[12]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[13]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[14]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[15]  Justin Solomon,et al.  Hierarchical Optimal Transport for Document Representation , 2019, NeurIPS.

[16]  François-Xavier Vialard,et al.  Scaling algorithms for unbalanced optimal transport problems , 2017, Math. Comput..

[17]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[18]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[19]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[20]  Gabriel Peyré,et al.  Wasserstein Barycentric Coordinates: Histogram Regression Using Optimal Transport , 2021 .

[21]  Darina Dvinskikh,et al.  On the Complexity of Approximating Wasserstein Barycenter , 2019, ArXiv.

[22]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[23]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[24]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[25]  Christopher R'e,et al.  Machine Learning on Graphs: A Model and Comprehensive Taxonomy , 2020, ArXiv.

[26]  Danai Koutra,et al.  t-PINE: tensor-based predictable and interpretable node embeddings , 2018, Social Network Analysis and Mining.

[27]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[28]  Marco Cuturi,et al.  Wasserstein regularization for sparse multi-task regression , 2018, AISTATS.

[29]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[30]  Xiao Huang,et al.  On Interpretation of Network Embedding via Taxonomy Induction , 2018, KDD.

[31]  Stephan Günnemann,et al.  Certifiable Robustness to Graph Perturbations , 2019, NeurIPS.

[32]  Giovanni Chierchia,et al.  GOT: An Optimal Transport framework for Graph comparison , 2019, NeurIPS.

[33]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[34]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[35]  Manish Gupta,et al.  Towards Interpretation of Node Embeddings , 2018, WWW.

[36]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[37]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[38]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[39]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[40]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[41]  Jean-Luc Starck,et al.  Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning , 2017, SIAM J. Imaging Sci..

[42]  Yorick Wilks,et al.  A Closer Look at Skip-gram Modelling , 2006, LREC.

[43]  Wenwu Zhu,et al.  Deep Variational Network Embedding in Wasserstein Space , 2018, KDD.

[44]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[45]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[46]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[47]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[48]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.