Network Representation Learning: Consolidation and Renewed Bearing

Graphs are a natural abstraction for many problems where nodes represent entities and edges represent a relationship across entities. An important area of research that has emerged over the last decade is the use of graphs as a vehicle for non-linear dimensionality reduction in a manner akin to previous efforts based on manifold learning with uses for downstream database processing, machine learning and visualization. In this systematic yet comprehensive experimental survey, we benchmark several popular network representation learning methods operating on two key tasks: link prediction and node classification. We examine the performance of 12 unsupervised embedding methods on 15 datasets. To the best of our knowledge, the scale of our study -- both in terms of the number of methods and number of datasets -- is the largest to date. Our results reveal several key insights about work-to-date in this space. First, we find that certain baseline methods (task-specific heuristics, as well as classic manifold methods) that have often been dismissed or are not considered by previous efforts can compete on certain types of datasets if they are tuned appropriately. Second, we find that recent methods based on matrix factorization offer a small but relatively consistent advantage over alternative methods (e.g., random-walk based methods) from a qualitative standpoint. Specifically, we find that MNMF, a community preserving embedding method, is the most competitive method for the link prediction task. While NetMF is the most competitive baseline for node classification. Third, no single method completely outperforms other embedding methods on both node classification and link prediction tasks. We also present several drill-down analysis that reveals settings under which certain algorithms perform well (e.g., the role of neighborhood context on performance) -- guiding the end-user.

[1]  E. Xing,et al.  Mixed Membership Stochastic Block Models for Relational Data with Application to Protein-Protein Interactions , 2006 .

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[4]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[5]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[6]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[7]  Yupeng Gu,et al.  The Co-Evolution Model for Social Network Evolving and Opinion Migration , 2017, KDD.

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Srinivasan Parthasarathy,et al.  PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization , 2019, ArXiv.

[10]  Azadeh Iranmehr,et al.  Trust Management for Semantic Web , 2009, 2009 Second International Conference on Computer and Electrical Engineering.

[11]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[12]  James P. G. Sterbenz,et al.  Multilevel resilience analysis of transportation and communication networks , 2015, Telecommun. Syst..

[13]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[17]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[18]  Ian T. Foster,et al.  Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems , 2002, IPTPS.

[19]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[20]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[21]  William W. Cohen,et al.  Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.

[22]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[23]  Jennifer M. Rust,et al.  The BioGRID Interaction Database , 2011 .

[24]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[25]  M. Saunders,et al.  Towards a Generalized Singular Value Decomposition , 1981 .

[26]  David Eppstein,et al.  On Nearest-Neighbor Graphs , 1992, ICALP.

[27]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[28]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[29]  Jian Li,et al.  NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization , 2019, WWW.

[30]  Venkatesh Saligrama,et al.  Anomaly Detection with Score functions based on Nearest Neighbor Graphs , 2009, NIPS.

[31]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[32]  Dhruba K. Bhattacharyya,et al.  Network Anomaly Detection: A Machine Learning Perspective , 2013 .

[33]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[34]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[35]  Fan Zhang,et al.  OLAK: An Efficient Algorithm to Prevent Unraveling in Social Networks , 2017, Proc. VLDB Endow..

[36]  Rémy Cazabet,et al.  Systematic Biases in Link Prediction: comparing heuristic and graph embedding based methods , 2018, COMPLEX NETWORKS.

[37]  Mikhail Belkin,et al.  Diving into the shallows: a computational perspective on large-scale shallow learning , 2017, NIPS.

[38]  Emmanuel Müller,et al.  VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[39]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[40]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[41]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[42]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[43]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[44]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[45]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[46]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[47]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[48]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[49]  Srinivasan Parthasarathy,et al.  SEANO: Semi-supervised Embedding in Attributed Networks with Outliers , 2017, SDM.

[50]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[51]  Sampo Pyysalo,et al.  Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches , 2018, BMC Bioinformatics.

[52]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[53]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[54]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[55]  Alexander A. Alemi,et al.  Watch Your Step: Learning Node Embeddings via Graph Attention , 2017, NeurIPS.

[56]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[57]  Ashwin Machanavajjhala,et al.  Entity Resolution: Theory, Practice & Open Challenges , 2012, Proc. VLDB Endow..

[58]  Srinivasan Parthasarathy,et al.  MILE: A Multi-Level Framework for Scalable Graph Embedding , 2018, ICWSM.

[59]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[60]  Balaraman Ravindran,et al.  Fusion Graph Convolutional Networks , 2018, ArXiv.