Attributed network embedding via subspace discovery

Network embedding aims to learn a latent, low-dimensional vector representations of network nodes, effective in supporting various network analytic tasks. While prior arts on network embedding focus primarily on preserving network topology structure to learn node representations, recently proposed attributed network embedding algorithms attempt to integrate rich node content information with network topological structure for enhancing the quality of network embedding. In reality, networks often have sparse content, incomplete node attributes, as well as the discrepancy between node attribute feature space and network structure space, which severely deteriorates the performance of existing methods. In this paper, we propose a unified framework for attributed network embedding–attri2vec—that learns node embeddings by discovering a latent node attribute subspace via a network structure guided transformation performed on the original attribute space. The resultant latent subspace can respect network structure in a more consistent way towards learning high-quality node representations. We formulate an optimization problem which is solved by an efficient stochastic gradient descent algorithm, with linear time complexity to the number of nodes. We investigate a series of linear and non-linear transformations performed on node attributes and empirically validate their effectiveness on various types of networks. Another advantage of attri2vec is its ability to solve out-of-sample problems, where embeddings of new coming nodes can be inferred from their node attributes through the learned mapping function. Experiments on various types of networks confirm that attri2vec is superior to state-of-the-art baselines for node classification, node clustering, as well as out-of-sample link prediction tasks. The source code of this paper is available at https://github.com/daokunzhang/attri2vec.

[1]  Chengqi Zhang,et al.  Tri-Party Deep Network Representation , 2016, IJCAI.

[2]  Xiao Huang,et al.  Label Informed Attributed Network Embedding , 2017, WSDM.

[3]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[5]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[6]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[7]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[8]  HyvärinenAapo,et al.  Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics , 2012 .

[9]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[10]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[11]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[12]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Ray Reagans,et al.  Network Structure and Knowledge Transfer: The Effects of Cohesion and Range , 2003 .

[14]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[17]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[18]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[19]  Xiangnan He,et al.  Attributed Social Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[20]  Xiaoming Zhang,et al.  From Properties to Links: Deep Network Embedding on Incomplete Graphs , 2017, CIKM.

[21]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[22]  Chengqi Zhang,et al.  CFOND: Consensus Factorization for Co-Clustering Networked Data , 2019, IEEE Transactions on Knowledge and Data Engineering.

[23]  P. Pin,et al.  Assessing the relevance of node features for network structure , 2008, Proceedings of the National Academy of Sciences.

[24]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[25]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[26]  Bo Zhang,et al.  Discriminative Deep Random Walk for Network Classification , 2016, ACL.

[27]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[28]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[29]  Chengqi Zhang,et al.  Homophily, Structure, and Content Augmented Network Representation Learning , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[30]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[31]  Alexander J. Smola,et al.  Reducing the sampling complexity of topic models , 2014, KDD.

[32]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[33]  Chengqi Zhang,et al.  User Profile Preserving Social Network Embedding , 2017, IJCAI.

[34]  Xiao Huang,et al.  Accelerated Attributed Network Embedding , 2017, SDM.

[35]  Chengqi Zhang,et al.  Collective Classification via Discriminative Matrix Factorization on Sparsely Labeled Networks , 2016, CIKM.

[36]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.