A Fast Sampling Method of Exploring Graphlet Degrees of Large Directed and Undirected Graphs

Exploring small connected and induced subgraph patterns (CIS patterns, or graphlets) has recently attracted considerable attention. Despite recent efforts on computing the number of instances a specific graphlet appears in a large graph (i.e., the total number of CISes isomorphic to the graphlet), little attention has been paid to characterizing a node's graphlet degree, i.e., the number of CISes isomorphic to the graphlet that include the node, which is an important metric for analyzing complex networks such as social and biological networks. Similar to global graphlet counting, it is challenging to compute node graphlet degrees for a large graph due to the combinatorial nature of the problem. Unfortunately, previous methods of computing global graphlet counts are not suited to solve this problem. In this paper we propose sampling methods to estimate node graphlet degrees for undirected and directed graphs, and analyze the error of our estimates. To the best of our knowledge, we are the first to study this problem and give a fast scalable solution. We conduct experiments on a variety of real-word datasets that demonstrate that our methods accurately and efficiently estimate node graphlet degrees for graphs with millions of edges.

[1]  F. Graybill,et al.  Combining Unbiased Estimators , 1959 .

[2]  Mam Riess Jones Color Coding , 1962, Human factors.

[3]  Noga Alon,et al.  Color-coding , 1995, JACM.

[4]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[5]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[7]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[8]  F. Schreiber,et al.  MODA: an efficient algorithm for network motif discovery in biological networks. , 2009, Genes & genetic systems.

[9]  T. Milenković,et al.  Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data , 2010, Journal of The Royal Society Interface.

[10]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[11]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[12]  Natasa Przulj Biological network comparison using graphlet degree distribution , 2010, Bioinform..

[13]  Mohammad Al Hasan,et al.  GUISE: Uniform Sampling of Graphlets for Large Graph Analysis , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Mohammad Al Hasan,et al.  GRAFT: an approximate graphlet counting algorithm for large graph analysis , 2012, CIKM.

[15]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[16]  Ali Pinar,et al.  A space efficient streaming algorithm for triangle counting using the birthday paradox , 2012, KDD.

[17]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[18]  Minghua Chen,et al.  Predicting positive and negative links in signed social networks by transfer learning , 2013, WWW.

[19]  Donald F. Towsley,et al.  Efficiently Estimating Motif Statistics of Large Networks , 2013, TKDD.

[20]  Ramana Rao Kompella,et al.  Graph sample and hold: a framework for big-graph analytics , 2014, KDD.

[21]  Qinghua Zheng,et al.  Motif-Based Hyponym Relation Extraction from Wikipedia Hyperlinks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[22]  Jing Tao,et al.  Moss: A Scalable Tool for Efficiently Sampling and Counting 4- and 5-Node Graphlets , 2015, ArXiv.

[23]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[24]  Chengqi Zhang,et al.  TrGraph: Cross-Network Transfer Learning via Common Signature Subgraphs , 2015, IEEE Trans. Knowl. Data Eng..

[25]  Donald F. Towsley,et al.  Minfer: Inferring Motif Statistics From Sampled Edges , 2015, ArXiv.

[26]  Alexandros G. Dimakis,et al.  Distributed Estimation of Graph 4-Profiles , 2016, WWW.