SNOD: a fast sampling method of exploring node orbit degrees for large graphs

Exploring small connected and induced subgraph patterns (CIS patterns, or graphlets) has recently attracted considerable attention. Despite recent efforts on computing how frequent a graphlet appears in a large graph (i.e., the total number of CISes isomorphic to the graphlet), little effort has been made to characterize a node’s graphlet orbit degree, i.e., the number of CISes isomorphic to the graphlet that touch the node at a particular orbit, which is an important fine-grained metric for analyzing complex networks such as learning functions/roles of nodes in social and biological networks. Like global graphlet counting, it is computationally intensive to compute node orbit degrees for a large graph. Furthermore, previous methods of computing global graphlet counts are not suited to solve this problem. In this paper, we propose a novel sampling method SNOD to efficiently estimate node orbit degrees for large-scale graphs and quantify the error of our estimates. To the best of our knowledge, we are the first to study this problem and give a fast scalable solution. We conduct experiments on a variety of real-world datasets and demonstrate that our method SNOD is several orders of magnitude faster than state-of-the-art enumeration methods for accurately estimating node orbit degrees for graphs with millions of edges.

[1]  Alexandros G. Dimakis,et al.  Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs , 2015, KDD.

[2]  Lin Ma,et al.  Parallel subgraph listing in a large-scale graph , 2014, SIGMOD Conference.

[3]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[4]  Mohammad Al Hasan,et al.  GUISE: Uniform Sampling of Graphlets for Large Graph Analysis , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[6]  Donald F. Towsley,et al.  Efficiently Estimating Motif Statistics of Large Networks , 2013, TKDD.

[7]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[8]  Donald F. Towsley,et al.  Minfer: A method of inferring motif statistics from sampled edges , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[9]  Thomas Schank,et al.  Algorithmic Aspects of Triangle-Based Network Analysis , 2007 .

[10]  Yuval Shavitt,et al.  RAGE - A rapid graphlet enumerator for large networks , 2012, Comput. Networks.

[11]  Ramana Rao Kompella,et al.  Graph sample and hold: a framework for big-graph analytics , 2014, KDD.

[12]  Tin Wee Tan,et al.  In silico grouping of peptide/HLA class I complexes using structural interaction characteristics , 2007, Bioinform..

[13]  Qinghua Zheng,et al.  Motif-Based Hyponym Relation Extraction from Wikipedia Hyperlinks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[14]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[16]  Fernando M. A. Silva,et al.  Parallel Subgraph Counting for Multicore Architectures , 2014, 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[17]  Mohammad Al Hasan,et al.  GRAFT: an approximate graphlet counting algorithm for large graph analysis , 2012, CIKM.

[18]  Tamara G. Kolda,et al.  Wedge sampling for computing clustering coefficients and triangle counts on large graphs † , 2013, Stat. Anal. Data Min..

[19]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[20]  Ryan A. Rossi,et al.  Estimation of Graphlet Statistics , 2017, ArXiv.

[21]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[22]  Minghua Chen,et al.  Predicting positive and negative links in signed social networks by transfer learning , 2013, WWW.

[23]  Jing Tao,et al.  Moss: A Scalable Tool for Efficiently Sampling and Counting 4- and 5-Node Graphlets , 2015, ArXiv.

[24]  T. Milenković,et al.  Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data , 2010, Journal of The Royal Society Interface.

[25]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[26]  F. Schreiber,et al.  MODA: an efficient algorithm for network motif discovery in biological networks. , 2009, Genes & genetic systems.

[27]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[28]  F. Graybill,et al.  Combining Unbiased Estimators , 1959 .

[29]  Ali Pinar,et al.  ESCAPE: Efficiently Counting All 5-Vertex Subgraphs , 2016, WWW.

[30]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[31]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[32]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[33]  Alexandros G. Dimakis,et al.  Distributed Estimation of Graph 4-Profiles , 2016, WWW.

[34]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[35]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[36]  Chengqi Zhang,et al.  TrGraph: Cross-Network Transfer Learning via Common Signature Subgraphs , 2015, IEEE Trans. Knowl. Data Eng..

[37]  Mam Riess Jones Color Coding , 1962, Human factors.

[38]  Noga Alon,et al.  Color-coding , 1995, JACM.

[39]  Vachik S. Dave,et al.  E-CLoG: Counting edge-centric local graphlets , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[40]  Donald F. Towsley,et al.  Minfer: Inferring Motif Statistics From Sampled Edges , 2015, ArXiv.

[41]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.