论文信息 - VERSE: Versatile Graph Embeddings from Similarity Measures

VERSE: Versatile Graph Embeddings from Similarity Measures

Embedding a web-scale information network into a low-dimensional vector space facilitates tasks such as link prediction, classification, and visualization. Past research has addressed the problem of extracting such embeddings by adopting methods from words to graphs, without defining a clearly comprehensible graph-related objective. Yet, as we show, the objectives used in past works implicitly utilize similarity measures among graph nodes. In this paper, we carry the similarity orientation of previous works to its logical conclusion; we propose VERtex Similarity Embeddings (VERSE), a simple, versatile, and memory-efficient method that derives graph embeddings explicitly calibrated to preserve the distributions of a selected vertex-to-vertex similarity measure. VERSE learns such embeddings by training a single-layer neural network. While its default, scalable version does so via sampling similarity information, we also develop a variant using the full information per vertex. Our experimental study on standard benchmarks and real-world datasets demonstrates that VERSE, instantiated with diverse similarity measures, outperforms state-of-the-art methods in terms of precision and recall in major data mining tasks and supersedes them in time and space efficiency, while the scalable sampling-based variant achieves equally good result as the non-scalable full variant.

[1] Zhiyuan Liu,et al. Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[2] Jian Pei,et al. Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[3] Linyuan Lu,et al. Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[4] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5] Huan Liu,et al. Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[6] Sami Abu-El-Haija,et al. Learning Edge Representations via Low-Rank Asymmetric Projections , 2017, CIKM.

[7] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[8] Emmanuel Müller,et al. Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[9] Michel Verleysen,et al. Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[10] Qiongkai Xu,et al. GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[11] Nitesh V. Chawla,et al. New perspectives and methods in link prediction , 2010, KDD.

[12] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14] Xiao Huang,et al. Label Informed Attributed Network Embedding , 2017, WSDM.

[15] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16] Alexander J. Smola,et al. Distributed large-scale natural graph factorization , 2013, WWW.

[17] Grigorios Tsoumakas,et al. Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[18] Leo Katz,et al. A new status index derived from sociometric analysis , 1953 .

[19] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[20] Wenwu Zhu,et al. Structural Deep Network Embedding , 2016, KDD.

[21] M E J Newman,et al. Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[22] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[23] Wei Lu,et al. Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[24] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[25] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[26] Stephen Lin,et al. Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Huan Liu,et al. Relational learning via latent social dimensions , 2009, KDD.

[28] W. Zachary,et al. An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[29] Alexandre Allauzen,et al. An experimental analysis of Noise-Contrastive Estimation: the noise distribution matters , 2017, EACL.

[30] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.

[31] Huan Liu,et al. Unsupervised feature selection for linked social media data , 2012, KDD.

[32] Jennifer Widom,et al. SimRank: a measure of structural-context similarity , 2002, KDD.

[33] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.