Motif Counting Beyond Five Nodes

Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets.

[1]  Donald F. Towsley,et al.  Efficiently Estimating Motif Statistics of Large Networks , 2013, TKDD.

[2]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[3]  Michael D. Vose,et al.  A Linear Algorithm For Generating Random Numbers With a Given Distribution , 1991, IEEE Trans. Software Eng..

[4]  John C. S. Lui,et al.  A General Framework for Estimating Graphlet Statistics via Random Walk , 2016, Proc. VLDB Endow..

[5]  V. Climenhaga Markov chains and mixing times , 2013 .

[6]  Mark Jerrum,et al.  The parameterised complexity of counting connected subgraphs and graph motifs , 2013, J. Comput. Syst. Sci..

[7]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[8]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[9]  Ravi Kumar,et al.  Counting Graphlets: Space vs Time , 2017, WSDM.

[10]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[11]  Andrzej Lingas,et al.  Detecting and Counting Small Pattern Graphs , 2013, ISAAC.

[12]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[13]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[14]  Xiangliang Zhang,et al.  MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs , 2018, IEEE Transactions on Knowledge and Data Engineering.

[15]  Mohammad Al Hasan,et al.  GUISE: Uniform Sampling of Graphlets for Large Graph Analysis , 2012, 2012 IEEE 12th International Conference on Data Mining.

[16]  Mohammad Al Hasan,et al.  Finding Network Motifs Using MCMC Sampling , 2015, CompleNet.

[17]  Mam Riess Jones Color Coding , 1962, Human factors.

[18]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[19]  Ali Pinar,et al.  ESCAPE: Efficiently Counting All 5-Vertex Subgraphs , 2016, WWW.

[20]  Ryan Williams,et al.  Finding, minimizing, and counting weighted subgraphs , 2009, STOC '09.

[21]  Madhav V. Marathe,et al.  Subgraph Enumeration in Large Social Contact Networks Using Parallel Color Coding and Streaming , 2010, 2010 39th International Conference on Parallel Processing.

[22]  Baruch Schieber,et al.  Subgraph Counting: Color Coding Beyond Trees , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[23]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[24]  Harish Sethu,et al.  Waddling Random Walk: Fast and Accurate Mining of Motif Statistics in Large Graphs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[25]  Jing Tao,et al.  A Fast Sampling Method of Exploring Graphlet Degrees of Large Directed and Undirected Graphs , 2016, ArXiv.

[26]  W. T. Tutte Graph Theory , 1984 .

[27]  Louxin Zhang,et al.  Counting motifs in the human interactome , 2013, Nature Communications.

[28]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[29]  Kamesh Madduri,et al.  Fast Approximate Subgraph Counting and Enumeration , 2013, 2013 42nd International Conference on Parallel Processing.

[30]  F. Chung Four proofs for the Cheeger inequality and graph partition algorithms , 2007 .

[31]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.