Network Vector: Distributed Representations of Networks with Global Context

We propose a neural embedding algorithm called Network Vector, which learns distributed representations of nodes and the entire networks simultaneously. By embedding networks in a low-dimensional space, the algorithm allows us to compare networks in terms of structural similarity and to solve outstanding predictive problems. Unlike alternative approaches that focus on node level features, we learn a continuous global vector that captures each node's global context by maximizing the predictive likelihood of random walk paths in the network. Our algorithm is scalable to real world graphs with many nodes. We evaluate our algorithm on datasets from diverse domains, and compare it with state-of-the-art techniques in node classification, role discovery and concept analogy tasks. The empirical results show the effectiveness and the efficiency of our algorithm.

[1]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[2]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[3]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[4]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[5]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[6]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[9]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[10]  Danai Koutra,et al.  NetSimile: A Scalable Approach to Size-Independent Network Similarity , 2012, ArXiv.

[11]  Charu C. Aggarwal,et al.  Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2016, KDD.

[12]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[13]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[14]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[15]  Alastair J. Walker,et al.  An Efficient Method for Generating Discrete Random Variables with General Distributions , 1977, TOMS.

[16]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[19]  Shuliang Wang,et al.  Data Mining and Knowledge Discovery , 2005, Mathematical Principles of the Internet.

[20]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[21]  Ian Davidson,et al.  Guided learning for role discovery (GLRD): framework, algorithms, and applications , 2013, KDD.

[22]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[23]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[24]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[25]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[26]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[27]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[28]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[29]  Huan Liu,et al.  Leveraging social media networks for classification , 2011, Data Mining and Knowledge Discovery.

[30]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.