Triangle Counting in Dynamic Graph Streams

Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary order or as incidence lists. However, with a few exceptions, the algorithms have considered insert-only streams. We present a new algorithm estimating the number of triangles in dynamic graph streams where edges can be both inserted and deleted. We show that our algorithm achieves better time and space complexity than previous solutions for various graph classes, for example sparse graphs with a relatively small number of triangles. Also, for graphs with constant transitivity coefficient, a common situation in real graphs, this is the first algorithm achieving constant processing time per edge. The result is achieved by a novel approach combining sampling of vertex triples and sparsification of the input graph. In the course of the analysis of the algorithm we present a lower bound on the number of pairwise independent 2-paths in general graphs which might be of independent interest. At the end of the paper we discuss lower bounds on the space complexity of triangle counting algorithms that make no assumptions on the structure of the graph.

[1]  Mihail N. Kolountzakis,et al.  Triangle Sparsifiers , 2011, J. Graph Algorithms Appl..

[2]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[3]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[4]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[5]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[6]  Mikkel Thorup,et al.  Tabulation-Based 5-Independent Hashing with Applications to Linear Probing and Second Moment Estimation , 2012, SIAM J. Comput..

[7]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[8]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[9]  Moni Naor,et al.  Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation , 2009, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[10]  Martin Dietzfelbinger,et al.  Design Strategies for Minimal Perfect Hash Functions , 2007, SAGA.

[11]  Luca Becchetti,et al.  Efficient algorithms for large-scale local triangle counting , 2010, TKDD.

[12]  Piotr Indyk,et al.  Sampling in Dynamic Data Streams and Applications , 2008, Int. J. Comput. Geom. Appl..

[13]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[14]  List of Open Problems in Sublinear Algorithms , .

[15]  Hossein Jowhari,et al.  Tight bounds for Lp samplers, finding duplicates in streams, and related problems , 2010, PODS.

[16]  Anna Pagh,et al.  Uniform Hashing in Constant Time and Optimal Space , 2008, SIAM J. Comput..

[17]  Piotr Indyk,et al.  Sampling in dynamic data streams and applications , 2005, Int. J. Comput. Geom. Appl..

[18]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[19]  Kurt Mehlhorn,et al.  Approximate Counting of Cycles in Streams , 2011, ESA.

[20]  Sudipto Guha,et al.  Graph sketches: sparsification, spanners, and subgraphs , 2012, PODS.

[21]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[22]  Virginia Vassilevska Williams,et al.  Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.

[23]  Thomas Sauerwald,et al.  Counting Arbitrary Subgraphs in Data Streams , 2012, ICALP.

[24]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[25]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[26]  Mikkel Thorup,et al.  Tabulation based 4-universal hashing with applications to second moment estimation , 2004, SODA '04.

[27]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[29]  Noam Nisan,et al.  On Randomized One-round Communication Complexity , 1995, STOC '95.

[30]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[31]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[32]  Mikkel Thorup,et al.  The power of simple tabulation hashing , 2010, STOC.

[33]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[34]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..

[35]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.