A Unified Framework to Estimate Global and Local Graphlet Counts for Streaming Graphs

Counting small connected subgraph patterns called graphlets is emerging as a powerful tool for exploring topological structure of networks and for analysis of roles of individual nodes. Graphlets have numerous applications ranging from biology to network science. Computing graphlet counts for "dynamic graphs" is highly challenging due to the streaming nature of the input, sheer size of the graphs, and superlinear time complexity of the problem. Few practical results are known under the massive streaming graphs setting. In this work, we propose a "unified framework" to estimate the graphlet counts of the whole graph as well as the graphlet counts of individual nodes under the streaming graph setting. Our framework subsumes previous methods and provides more flexible and accurate estimation of the graphlet counts. We propose a general unbiased estimator which can be applied to any k-node graphlets. Furthermore, efficient implementation is provided for the 3, 4-node graphlets. We perform detailed empirical study on real-world graphs, and show that our framework produces estimation of graphlet count for streaming graphs with 1.7 to 170.8 times smaller error compared with other state-of-the-art methods. Our framework also achieves high accuracy on the estimation of graphlets for each individual node which previous works could not achieve.

[1]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[2]  Chengqi Zhang,et al.  TrGraph: Cross-Network Transfer Learning via Common Signature Subgraphs , 2015, IEEE Transactions on Knowledge and Data Engineering.

[3]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[4]  Ryan A. Rossi,et al.  Graphlet decomposition: framework, algorithms, and applications , 2015, Knowledge and Information Systems.

[5]  Ramana Rao Kompella,et al.  Graph sample and hold: a framework for big-graph analytics , 2014, KDD.

[6]  Han Zhao,et al.  Global Network Alignment in the Context of Aging , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Mihail N. Kolountzakis,et al.  Triangle Sparsifiers , 2011, J. Graph Algorithms Appl..

[8]  T. Milenković,et al.  Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data , 2010, Journal of The Royal Society Interface.

[9]  Sunmin Lee,et al.  FURL: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams , 2019, Data Mining and Knowledge Discovery.

[10]  Yongsub Lim,et al.  MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams , 2015, KDD.

[11]  Nataša Pržulj,et al.  Graphlet-based Characterization of Directed Networks , 2016, Scientific Reports.

[12]  Ryan A. Rossi,et al.  Fast Parallel Graphlet Counting for Large Networks , 2015, ArXiv.

[13]  Donald F. Towsley,et al.  Minfer: A method of inferring motif statistics from sampled edges , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[14]  Nino Shervashidze,et al.  Advanced graph kernels: Graphlet Kernels , 2010 .

[15]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[16]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[17]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[18]  Alexandros G. Dimakis,et al.  Distributed Estimation of Graph 4-Profiles , 2016, WWW.

[19]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[20]  Alexandros G. Dimakis,et al.  Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs , 2015, KDD.

[21]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[22]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[23]  Raphael Yuster,et al.  Finding Even Cycles Even Faster , 1994, SIAM J. Discret. Math..

[24]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[25]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[26]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[27]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[28]  Ali Pinar,et al.  ESCAPE: Efficiently Counting All 5-Vertex Subgraphs , 2016, WWW.

[29]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Donald F. Towsley,et al.  Minfer: Inferring Motif Statistics From Sampled Edges , 2015, ArXiv.

[31]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.