Search Efficient Binary Network Embedding

Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned continuous vector representations are inefficient for large-scale similarity search, which often involves finding nearest neighbors measured by distance or similarity in a continuous vector space. In this article, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations using a stochastic gradient descent-based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support faster node similarity search than using Euclidean or other distance measures. Extensive experiments and comparisons demonstrate that BinaryNE not only delivers more than 25 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods. The binary codes learned by BinaryNE also render competitive performance on node classification and node clustering tasks. The source code of the BinaryNE algorithm is available at https://github.com/daokunzhang/BinaryNE.

[1]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[2]  Yong Ge,et al.  Binarized Collaborative Filtering with Distilling Graph Convolutional Networks , 2019, IJCAI.

[3]  Chengqi Zhang,et al.  Binarized attributed network embedding , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[4]  Chengqi Zhang,et al.  SINE: Scalable Incomplete Network Embedding , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[5]  Xing Xie,et al.  High-order Proximity Preserving Information Network Hashing , 2018, KDD.

[6]  Weiwei Liu,et al.  Discrete Network Embedding , 2018, IJCAI.

[7]  Chengqi Zhang,et al.  Efficient Attributed Network Embedding via Recursive Randomized Hashing , 2018, IJCAI.

[8]  Vinith Misra,et al.  Bernoulli Embeddings for Graphs , 2018, AAAI.

[9]  Yizhou Sun,et al.  Learning K-way D-dimensional Discrete Code For Compact Embedding Representations , 2017, ICML.

[10]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[11]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[12]  Xiaoming Zhang,et al.  From Properties to Links: Deep Network Embedding on Incomplete Graphs , 2017, CIKM.

[13]  Chengqi Zhang,et al.  User Profile Preserving Social Network Embedding , 2017, IJCAI.

[14]  Zhiyuan Liu,et al.  CANE: Context-Aware Network Embedding for Relation Modeling , 2017, ACL.

[15]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[16]  Xingquan Zhu,et al.  Hashing Techniques , 2017 .

[17]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[18]  Philip S. Yu,et al.  HashNet: Deep Learning to Hash by Continuation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Xiao Huang,et al.  Label Informed Attributed Network Embedding , 2017, WSDM.

[20]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[21]  D. Rachkovskij Binary Vectors for Fast Distance and Similarity Estimation , 2017, Cybernetics and Systems Analysis.

[22]  Xingquan Zhu,et al.  Hashing Techniques: A Survey and Taxonomy , 2017, ACM Comput. Surv..

[23]  Xiao Huang,et al.  Accelerated Attributed Network Embedding , 2017, SDM.

[24]  Chengqi Zhang,et al.  Homophily, Structure, and Content Augmented Network Representation Learning , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[25]  Chengqi Zhang,et al.  Collective Classification via Discriminative Matrix Factorization on Sparsely Labeled Networks , 2016, CIKM.

[26]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[27]  Bo Zhang,et al.  Discriminative Deep Random Walk for Network Classification , 2016, ACL.

[28]  Chengqi Zhang,et al.  Tri-Party Deep Network Representation , 2016, IJCAI.

[29]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[30]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[31]  Jianmin Wang,et al.  Deep Hashing Network for Efficient Similarity Retrieval , 2016, AAAI.

[32]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[33]  R. Manjula,et al.  Similarity Index based Link Prediction Algorithms in Social Networks: A Survey , 2016, Journal of Telecommunications and Information Technology.

[34]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[35]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[36]  Hanghang Tong,et al.  Panther: Fast Top-k Similarity Search on Large Networks , 2015, KDD.

[37]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[38]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[39]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[40]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[41]  Ming-Hsuan Yang,et al.  Locality preserving hashing , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[42]  Alexander J. Smola,et al.  Reducing the sampling complexity of topic models , 2014, KDD.

[43]  Ken-ichi Kawarabayashi,et al.  Scalable similarity search for SimRank , 2014, SIGMOD Conference.

[44]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[45]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[46]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[47]  Charalampos E. Tsourakakis Toward Quantifying Vertex Similarity in Networks , 2011, Internet Math..

[48]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[49]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[50]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[51]  Laks V. S. Lakshmanan,et al.  On Top-k Structural Similarity Search , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[52]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.

[53]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[54]  Ruoming Jin,et al.  Axiomatic ranking of network role similarity , 2011, KDD.

[55]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[56]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[57]  Raymond Hemmecke,et al.  Nonlinear Integer Programming , 2009, 50 Years of Integer Programming.

[58]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[59]  P. Pin,et al.  Assessing the relevance of node features for network structure , 2008, Proceedings of the National Academy of Sciences.

[60]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[61]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[62]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[63]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[64]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[65]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[66]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[67]  Eugene L. Allgower,et al.  Numerical continuation methods - an introduction , 1990, Springer series in computational mathematics.

[68]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[69]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .