论文信息 - SENA: Preserving Social Structure for Network Embedding

SENA: Preserving Social Structure for Network Embedding

Network embedding transforms a network into a continuous feature space. Network augmentation, on the other hand, leverages this feature representation to obtain a more informative network by adding potentially plausible edges while removing noisy edges. Traditional network embedding methods are often inefficient in capturing - (i) the latent relationship when the network is sparse (the network sparsity problem), and (ii) the local and global neighborhood structure of vertices (structure preserving problem). We propose SENA, a structural embedding and network augmentation framework for social network analysis. Unlike other embedding methods which only generate vertex features, SENA generates features for both vertices and relations (edges) by minimizing a well-designed objective function composed of a loss function and a regularization. The loss function reduces the network-sparsity problem by learning from both the edges present (true edges) and absent (false edges) in the network; whereas the regularization term preserves the structural properties of the network by efficiently considering - (i) the local neighborhood of vertices and edges, and (ii) the network spectra, i.e., eigenvectors of a symmetric matrix representing the network. We compare SENA with four baseline network embedding methods, namely DeepWalk, SE, SME and TransE. We demonstrate the efficacy of SENA through a task-based evaluation setting on different real-world networks. We consider the state-of-the-art algorithms for (i) community detection, (ii) link prediction and (iii) knowledge graph query answering, and show that with SENA's representation, these algorithms achieve up to 10%, 9% and (surprisingly) 108% higher accuracy respectively compared to the best baseline embedding methods.

[1] István Lukovits,et al. Resistance-distance matrix: A computational algorithm and its application , 2002 .

[2] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[3] P. Jaccard. THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[4] Sanjukta Bhowmick,et al. On the permanence of vertices in network communities , 2014, KDD.

[5] Santo Fortunato,et al. Finding Statistically Significant Communities in Networks , 2010, PloS one.

[6] Daniel A. Spielman,et al. Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[7] Jure Leskovec,et al. Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[8] Rada Mihalcea,et al. Random Walk Term Weighting for Improved Text Classification , 2007, Int. J. Semantic Comput..

[9] Matthew Brand,et al. Continuous nonlinear dimensionality reduction by kernel Eigenmaps , 2003, IJCAI.

[10] Lorenzo Rosasco,et al. Are Loss Functions All the Same? , 2004, Neural Computation.

[11] Peter Eades,et al. On-line Animated Visualization of Huge Graphs using a Modified Spring Algorithm , 1998, J. Vis. Lang. Comput..

[12] Michael William Newman,et al. The Laplacian spectrum of graphs , 2001 .

[13] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[14] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15] Matthieu Latapy,et al. Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[16] Jean-Loup Guillaume,et al. Fast unfolding of communities in large networks , 2008, 0803.0476.

[17] Nicolas Le Roux,et al. A latent factor model for highly multi-relational data , 2012, NIPS.

[18] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..

[19] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[20] Jason Weston,et al. Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[21] B. Mohar. THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[22] David Liben-Nowell,et al. The link-prediction problem for social networks , 2007 .

[23] Lawrence Cayton,et al. Algorithms for manifold learning , 2005 .

[24] Reynold Xin,et al. GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[25] Jure Leskovec,et al. Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[26] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .

[27] Martin Rosvall,et al. Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[28] Robert A. van de Geijn,et al. A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations , 2005, SIAM J. Sci. Comput..

[29] Emanuele Viola,et al. Pseudorandom Bits for Polynomials , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[30] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.

[31] Shuicheng Yan,et al. Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32] Mingzhe Wang,et al. LINE: Large-scale Information Network Embedding , 2015, WWW.

[33] Niloy Ganguly,et al. Metrics for Community Analysis , 2016, ACM Comput. Surv..

[34] Réka Albert,et al. Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35] Xirong Li,et al. Mapping Query to Semantic Concepts: Leveraging Semantic Indices for Automatic and Interactive Video Retrieval , 2007 .

[36] M E J Newman,et al. Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[37] Carmen Banea,et al. Random-Walk Term Weighting for Improved Text Classification , 2006 .

[38] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[39] Jason Weston,et al. A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[40] Manuel Blum,et al. Time Bounds for Selection , 1973, J. Comput. Syst. Sci..