QUINT: Node embedding using network hashing

Representation learning using network embedding has received tremendous attention due to its efficacy to solve downstream tasks. Popular embedding methods (such as deepwalk, node2vec, LINE) are based on a neural architecture, thus unable to scale on large networks both in terms of time and space usage. Recently, we proposed BinSketch, a sketching technique for compressing binary vectors to binary vectors. In this paper, we show how to extend BinSketch and use it for network hashing. Our proposal named QUINT is built upon BinSketch, and it embeds nodes of a sparse network onto a low-dimensional space using simple bit-wise operations. QUINT is the first of its kind that provides tremendous gain in terms of speed and space usage without compromising much on the accuracy of the downstream tasks. Extensive experiments are conducted to compare QUINT with seven state-of-the-art network embedding methods for two end tasks – link prediction and node classification. We observe huge performance gain for QUINT in terms of speedup (up to 7000× and space saving (up to 80×) due to its bit-wise nature to obtain node embedding. Moreover, QUINT is a consistent top-performer for both the tasks among the baselines across all the datasets. Our empirical observations are backed by rigorous theoretical analysis to justify the effectiveness of QUINT. In particular, we prove that QUINT retains enough structural information which can be used further to approximate many topological properties of networks with high confidence.

[1]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[2]  Wolfgang Nejdl,et al.  Hashing-Accelerated Graph Neural Networks for Link Prediction , 2021, WWW.

[3]  Paolo Rosso,et al.  NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching , 2019, KDD.

[4]  Steven Skiena,et al.  Fast and Accurate Network Embeddings via Very Sparse Random Projection , 2019, CIKM.

[5]  Jian Li,et al.  NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization , 2019, WWW.

[6]  David P. Woodruff,et al.  Frequent Directions: Simple and Deterministic Matrix Sketching , 2015, SIAM J. Comput..

[7]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[8]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[9]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[11]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[12]  Emmanuel Müller,et al.  FREDE: Anytime Graph Embeddings , 2021, Proceedings of the VLDB Endowment.

[13]  Rameshwar Pratap,et al.  Efficient Sketching Algorithm for Sparse Binary Data , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[14]  Maoguo Gong,et al.  Feature Hashing for Network Representation Learning , 2018, IJCAI.

[15]  Jie Yang,et al.  LBSN2Vec++: Heterogeneous Hypergraph Embedding for Location-Based Social Networks , 2020, IEEE Transactions on Knowledge and Data Engineering.

[16]  Emmanuel Müller,et al.  VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[17]  Ping Huang,et al.  SSNE: Effective Node Representation for Link Prediction in Sparse Networks , 2020, IEEE Access.

[18]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[19]  Weiwei Liu,et al.  Discrete Network Embedding , 2018, IJCAI.

[20]  James Demmel,et al.  Fast linear algebra is stable , 2006, Numerische Mathematik.

[21]  Chengqi Zhang,et al.  Efficient Attributed Network Embedding via Recursive Randomized Hashing , 2018, IJCAI.

[22]  Zhewei Wei,et al.  Scalable Graph Embeddings via Sparse Transpose Proximities , 2019, ArXiv.

[23]  Wu-Jun Li,et al.  Scalable Graph Hashing with Feature Transformation , 2015, IJCAI.

[24]  Xing Xie,et al.  High-order Proximity Preserving Information Network Hashing , 2018, KDD.

[25]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[26]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[27]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[28]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[29]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[30]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[31]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[32]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[33]  Yangzihao Wang,et al.  A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent , 2020, ArXiv.

[34]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[35]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[36]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[37]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[38]  Xiangnan He,et al.  Attributed Social Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[39]  Xiao Wang,et al.  Billion-Scale Network Embedding with Iterative Random Projection , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[40]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[41]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[42]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[43]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[44]  Chengqi Zhang,et al.  Binarized attributed network embedding , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[45]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[46]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[47]  Silvio Lattanzi,et al.  InstantEmbedding: Efficient Local Node Representations , 2020, ArXiv.