Delving Into Deep Walkers: A Convergence Analysis of Random-Walk-Based Vertex Embeddings

Graph vertex embeddings based on random walks have become increasingly influential in recent years, showing good performance in several tasks as they efficiently transform a graph into a more computationally digestible format while preserving relevant information. However, the theoretical properties of such algorithms, in particular the influence of hyperparameters and of the graph structure on their convergence behaviour, have so far not been well-understood. In this work, we provide a theoretical analysis for random-walks based embeddings techniques. Firstly, we prove that, under some weak assumptions, vertex embeddings derived from random walks do indeed converge both in the single limit of the number of random walks $N \to \infty$ and in the double limit of both $N$ and the length of each random walk $L\to\infty$. Secondly, we derive concentration bounds quantifying the converge rate of the corpora for the single and double limits. Thirdly, we use these results to derive a heuristic for choosing the hyperparameters $N$ and $L$. We validate and illustrate the practical importance of our findings with a range of numerical and visual experiments on several graphs drawn from real-world applications.

[1]  Hai-Cheng Yi,et al.  Prediction of Drug–Target Interactions From Multi-Molecular Network Based on Deep Walk Embedding Model , 2020, Frontiers in Bioengineering and Biotechnology.

[2]  Steven Skiena,et al.  Fast and Accurate Network Embeddings via Very Sparse Random Projection , 2019, CIKM.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[5]  Minlie Huang,et al.  SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions , 2016, AAAI.

[6]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[7]  Angelica I. Avilés-Rivero,et al.  When Labelled Data Hurts: Deep Semi-Supervised Classification with the Graph 1-Laplacian , 2019 .

[8]  Bo Zhang,et al.  Discriminative Deep Random Walk for Network Classification , 2016, ACL.

[9]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[10]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[11]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[12]  Elizabeth L. Wilmer,et al.  Markov Chains and Mixing Times , 2008 .

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Abdolreza Mirzaei,et al.  Hierarchical graph embedding in vector space by graph pyramid , 2017, Pattern Recognit..

[15]  Li Guo,et al.  Semantically Smooth Knowledge Graph Embedding , 2015, ACL.

[16]  Rik Sarkar,et al.  Multi-scale Attributed Node Embedding , 2019, J. Complex Networks.

[17]  Steven Skiena,et al.  Don't Walk, Skip!: Online Learning of Multi-scale Network Embeddings , 2016, ASONAM.

[18]  Feiping Nie,et al.  Flexible Orthogonal Neighborhood Preserving Embedding , 2017, IJCAI.

[19]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[20]  Chengqi Zhang,et al.  Tri-Party Deep Network Representation , 2016, IJCAI.

[21]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[22]  P. Diaconis,et al.  Geometric Bounds for Eigenvalues of Markov Chains , 1991 .

[23]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[24]  Philip S. Yu,et al.  Cross View Link Prediction by Learning Noise-resilient Representation Consensus , 2017, WWW.

[25]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[26]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  T. D. Morley,et al.  Eigenvalues of the Laplacian of a graph , 1985 .

[29]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[30]  Jie Tang,et al.  A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices , 2020, NeurIPS.

[31]  Jie Tang,et al.  Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs , 2015, IJCAI.

[32]  V. Climenhaga Markov chains and mixing times , 2013 .

[33]  Bin Wang,et al.  Deepwalk-assisted Graph PCA (DGPCA) for Language Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[35]  Yichi Zhang,et al.  Consistency of random-walk based network embedding algorithms , 2021, ArXiv.

[36]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[37]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[38]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[39]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[40]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[41]  Nicolas Papadakis,et al.  Beyond Supervised Classification: Extreme Minimal Supervision with the Graph 1-Laplacian , 2019, ArXiv.

[42]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[43]  Xuelong Li,et al.  Unsupervised Large Graph Embedding , 2017, AAAI.

[44]  Steven Skiena,et al.  HARP: Hierarchical Representation Learning for Networks , 2017, AAAI.

[45]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[46]  Adriano Veloso,et al.  Unsupervised and Scalable Algorithm for Learning Node Representations , 2017, ICLR.

[47]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[48]  Alharbi Basma,et al.  Learning from Your Network of Friends: A Trajectory Representation Learning Model Based on Online Social Ties , 2016 .

[49]  Hady Wirawan Lauw,et al.  Probabilistic Latent Document Network Embedding , 2014, 2014 IEEE International Conference on Data Mining.