SENA: Preserving Social Structure for Network Embedding

Network embedding transforms a network into a continuous feature space. Network augmentation, on the other hand, leverages this feature representation to obtain a more informative network by adding potentially plausible edges while removing noisy edges. Traditional network embedding methods are often inefficient in capturing - (i) the latent relationship when the network is sparse (the network sparsity problem), and (ii) the local and global neighborhood structure of vertices (structure preserving problem). We propose SENA, a structural embedding and network augmentation framework for social network analysis. Unlike other embedding methods which only generate vertex features, SENA generates features for both vertices and relations (edges) by minimizing a well-designed objective function composed of a loss function and a regularization. The loss function reduces the network-sparsity problem by learning from both the edges present (true edges) and absent (false edges) in the network; whereas the regularization term preserves the structural properties of the network by efficiently considering - (i) the local neighborhood of vertices and edges, and (ii) the network spectra, i.e., eigenvectors of a symmetric matrix representing the network. We compare SENA with four baseline network embedding methods, namely DeepWalk, SE, SME and TransE. We demonstrate the efficacy of SENA through a task-based evaluation setting on different real-world networks. We consider the state-of-the-art algorithms for (i) community detection, (ii) link prediction and (iii) knowledge graph query answering, and show that with SENA's representation, these algorithms achieve up to 10%, 9% and (surprisingly) 108% higher accuracy respectively compared to the best baseline embedding methods.

[1]  István Lukovits,et al.  Resistance-distance matrix: A computational algorithm and its application , 2002 .

[2]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[3]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[4]  Sanjukta Bhowmick,et al.  On the permanence of vertices in network communities , 2014, KDD.

[5]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[6]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[7]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[8]  Rada Mihalcea,et al.  Random Walk Term Weighting for Improved Text Classification , 2007, Int. J. Semantic Comput..

[9]  Matthew Brand,et al.  Continuous nonlinear dimensionality reduction by kernel Eigenmaps , 2003, IJCAI.

[10]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[11]  Peter Eades,et al.  On-line Animated Visualization of Huge Graphs using a Modified Spring Algorithm , 1998, J. Vis. Lang. Comput..

[12]  Michael William Newman,et al.  The Laplacian spectrum of graphs , 2001 .

[13]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[16]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[17]  Nicolas Le Roux,et al.  A latent factor model for highly multi-relational data , 2012, NIPS.

[18]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[19]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[20]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[21]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[22]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[23]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[24]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[25]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[26]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[27]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[28]  Robert A. van de Geijn,et al.  A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations , 2005, SIAM J. Sci. Comput..

[29]  Emanuele Viola,et al.  Pseudorandom Bits for Polynomials , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[30]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[31]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[33]  Niloy Ganguly,et al.  Metrics for Community Analysis , 2016, ACM Comput. Surv..

[34]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Xirong Li,et al.  Mapping Query to Semantic Concepts: Leveraging Semantic Indices for Automatic and Interactive Video Retrieval , 2007 .

[36]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Carmen Banea,et al.  Random-Walk Term Weighting for Improved Text Classification , 2006 .

[38]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[39]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[40]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..