GraRep: Learning Graph Representations with Global Structural Information

In this paper, we present {GraRep}, a novel model for learning vertex representations of weighted graphs. This model learns low dimensional vectors to represent vertices appearing in a graph and, unlike existing work, integrates global structural information of the graph into the learning process. We also formally analyze the connections between our work and several previous research efforts, including the DeepWalk model of Perozzi et al. as well as the skip-gram model with negative sampling of Mikolov et al. We conduct experiments on a language network, a social network as well as a citation network and show that our learned global representations can be effectively used as features in tasks such as clustering, classification and visualization. Empirical results demonstrate that our representation significantly outperforms other state-of-the-art methods in such tasks.

[1]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[2]  A. Laub,et al.  The singular value decomposition: Its computation and some applications , 1980 .

[3]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[4]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[5]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Ali Mansour,et al.  Blind Separation of Sources , 1999 .

[8]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[9]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[10]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[11]  John Caron,et al.  Experiments with LSA scoring: optimal rank and basis , 2001 .

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[13]  G. Karypis,et al.  Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems , 2002 .

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[16]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[17]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[18]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[19]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[20]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[21]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[22]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[23]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[24]  John A Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD , 2012, Behavior Research Methods.

[25]  Peter D. Turney Domain and Function: A Dual-Space Model of Semantic Relations and Compositions , 2012, J. Artif. Intell. Res..

[26]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[27]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[28]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[29]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[32]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[33]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.