Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study

Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Unsupervised graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key structural elements in the resulting embedding space. However, to date, there has been little work exploring exactly which topological structures are being learned in the embeddings, which could be a possible way to bring interpretability to the process. In this paper, we investigate if graph embeddings are approximating something analogous to traditional vertex-level graph features. If such a relationship can be found, it could be used to provide a theoretical insight into how graph embedding approaches function. We perform this investigation by predicting known topological features, using supervised and unsupervised methods, directly from the embedding space. If a mapping between the embeddings and topological features can be found, then we argue that the structural information encapsulated by the features is represented in the embedding space. To explore this, we present extensive experimental evaluation with five state-of-the-art unsupervised graph embedding techniques, across a range of empirical graph datasets, measuring a selection of topological features. We demonstrate that several topological features are indeed being approximated in the embedding space, allowing key insight into how graph embeddings create good representations.

[1]  Weiyi Liu,et al.  Learning Graph Topological Features via GAN , 2017, IEEE Access.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[4]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[5]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[6]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[7]  Geoffrey E. Hinton,et al.  Transforming Autoencoders , 2011 .

[8]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[9]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[10]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[11]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[12]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[13]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[14]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[15]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[16]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[17]  Qiang Wang,et al.  Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[18]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[19]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[20]  Cheng Li,et al.  DeepGraph: Graph Structure Predicts Network Growth , 2016, ArXiv.

[21]  Phillip Bonacich,et al.  Some unique properties of eigenvector centrality , 2007, Soc. Networks.

[22]  A. Stephen McGough,et al.  Evaluating the quality of graph embeddings via topological feature reconstruction , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[23]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[24]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[25]  R. C. Penner,et al.  Euclidean decompositions of noncompact hyperbolic manifolds , 1988 .

[26]  Geng Li,et al.  Effective graph classification based on topological and label attributes , 2012, Stat. Anal. Data Min..

[27]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[28]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[29]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[30]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[31]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[32]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[33]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[34]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[35]  A. Stephen McGough,et al.  Deep topology classification: A new approach for massive graph classification , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Luis G. Moyano,et al.  Learning network representations , 2017, The European Physical Journal Special Topics.

[38]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[39]  Danai Koutra,et al.  NetSimile: A Scalable Approach to Size-Independent Network Similarity , 2012, ArXiv.

[40]  Boguslaw Obara,et al.  A bioimage informatics approach to automatically extract complex fungal networks , 2012, Bioinform..

[41]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[42]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[43]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[44]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[45]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[46]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[47]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[48]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[49]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[50]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[51]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[52]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[53]  Tamara Munzner,et al.  Exploring Large Graphs in 3D Hyperbolic Space , 1998, IEEE Computer Graphics and Applications.

[54]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[55]  Marc Peter Deisenroth,et al.  Neural Embeddings of Graphs in Hyperbolic Space , 2017, ArXiv.

[56]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[57]  A. Stephen McGough,et al.  GFP-X: A parallel approach to massive graph comparison using spark , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[58]  Tapani Raiko,et al.  International Conference on Learning Representations (ICLR) , 2016 .

[59]  Michael Granitzer,et al.  Properties of Vector Embeddings in Social Networks , 2017, Algorithms.

[60]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[61]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[62]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[63]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.