Computing Clustering Coefficients in Data Streams ∗

We present random sampling algorithms that with probabilit y at least1 − δ compute a(1 ± ǫ)approximation of the clustering coefficient, the transitiv ity coefficient, and of the number of bipartite cliques in a graph given as a stream of edges. Our methods can b e extended to approximately count the number of occurences of fixed constant-size subgraphs. Our a lgorithms only require one pass over the input stream and their storage space depends only on structu ral parameters of the graphs, the approximation guarantee, and the confidence probability. For examp le, the algorithms to compute the clustering and transitivity coefficient depend on that coefficient but n ot on the size of the graph. Since many large social networks have small clustering and transitivity coe ffici nt, our algorithms use space independent of the size of the input for these graphs. We implemented our algorithms and evaluated their performa nce on networks from different application domains. The sizes of the considered input graphs var ied from about8, 000 nodes and40, 000 edges to about 135 million nodes and more than 1 billion edges. For both algorit hms we run experiments with a sample set size varying from 100, 000 to 1, 000, 000 to evaluate running time and approximation guarantee. Our algorithms appear to be time efficient for the se sample sizes.

[1]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[2]  Aravind Srinivasan,et al.  Structural and algorithmic aspects of massive social networks , 2004, SODA '04.

[3]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[4]  Matthieu Latapy,et al.  Theory and Practice of Triangle Problems in Very Large (Sparse (Power-Law)) Graphs , 2006, ArXiv.

[5]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[6]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[7]  Dorothea Wagner,et al.  Approximating Clustering Coefficient and Transitivity , 2005, J. Graph Algorithms Appl..

[8]  Ulrich Meyer,et al.  Algorithms and Experiments for the Webgraph , 2003, ESA.

[9]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[10]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[11]  A. Bonato RANDOM GRAPH MODELS FOR THE WEB GRAPH , 2007 .

[12]  Piotr Indyk,et al.  Sampling in dynamic data streams and applications , 2005, Int. J. Comput. Geom. Appl..

[13]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[14]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[15]  Frank Harary,et al.  Matrix measures for transitivity and balance , 1979 .

[16]  Christian Sohler,et al.  Counting Graph Minors in Data Streams , 2006 .

[17]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[18]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[19]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.