An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks

Network representation learning aims to learn low-dimensional, compressible, and distributed representational vectors of nodes in networks. Due to the expensive costs of obtaining label information of nodes in networks, many unsupervised network representation learning methods have been proposed, where random walk strategy is one of the wildly utilized approaches. However, the existing random walk based methods have some challenges, including: 1. The insufficiency of explaining what network knowledge in the walking path-samplings; 2. The adverse effects caused by the mixture of different information in networks; 3. The poor generality of the methods with hyper-parameters on different networks. This paper proposes an information-explainable random walk based unsupervised network representation learning framework named Probabilistic Accepted Walk (PAW) to obtain network representation from the perspective of the stationary distribution of networks. In the framework, we design two stationary distributions based on nodes’ self-information and local-information of networks to guide our proposed random walk strategy to learn representational vectors of networks through sampling paths of nodes. Numerous experimental results demonstrated that the PAW could obtain more expressive representation than the other six widely used unsupervised network representation learning baselines on four real-world networks in single-label and multi-label node classification tasks.

[1]  Stefano Antonio Gattone,et al.  Waste Management Analysis in Developing Countries through Unsupervised Classification of Mixed Data , 2019, Social Sciences.

[2]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[3]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[4]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[5]  Avrim Blum,et al.  Foundations of Data Science , 2020 .

[6]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[7]  Zhu-Hong You,et al.  Learning Multimodal Networks From Heterogeneous Data for Prediction of lncRNA–miRNA Interactions , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[9]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[10]  Jean-Charles Delvenne,et al.  Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks , 2014, IEEE Transactions on Network Science and Engineering.

[11]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[12]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[13]  Zhiyuan Liu,et al.  Community-enhanced Network Representation Learning for Network Analysis , 2016, ArXiv.

[14]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[15]  Claire Donnat,et al.  A Bayesian Hierarchical Network for Combining Heterogeneous Data Sources in Medical Diagnoses , 2020 .

[16]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[17]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[18]  S. Thompson,et al.  Quantifying heterogeneity in a meta‐analysis , 2002, Statistics in medicine.

[19]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[20]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Carlo Vittorio Cannistraci,et al.  Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding , 2013, Bioinform..

[23]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[24]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.