How Hard Is Counting Triangles in the Streaming Model?

The problem of (approximately) counting the number of triangles in a graph is one of the basic problems in graph theory. In this paper we study the problem in the streaming model. Specifically, the amount of memory required by a randomized algorithm to solve this problem. In case the algorithm is allowed one pass over the stream, we present a best possible lower bound of Ω(m) for graphs G with m edges. If a constant number of passes is allowed, we show a lower bound of Ω(m/T), T the number of triangles. We match, in some sense, this lower bound with a 2-pass O(m/T1/3)-memory algorithm that solves the problem of distinguishing graphs with no triangles from graphs with at least T triangles. We present a new graph parameter ρ(G) --- the triangle density, and conjecture that the space complexity of the triangles problem is Θ(m/ρ(G)). We match this by a second algorithm that solves the distinguishing problem using O(m/ρ(G))-memory.

[1]  Svante Janson,et al.  Random graphs , 2000, Wiley-Interscience series in discrete mathematics and optimization.

[2]  A. Rinaldo,et al.  On the geometry of discrete exponential families with application to exponential random graph models , 2008, 0901.0026.

[3]  Christos Faloutsos,et al.  Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation , 2011, Social Network Analysis and Mining.

[4]  Virginia Vassilevska Williams,et al.  Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.

[5]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[6]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[7]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[8]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[9]  Graham Cormode,et al.  Information Cost Tradeoffs for Augmented Index and Streaming Language Recognition , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[10]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[11]  Carolyn R. Bertozzi,et al.  Methods and Applications , 2009 .

[12]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[13]  Avi Wigderson,et al.  The Randomized Communication Complexity of Set Disjointness , 2007, Theory Comput..

[14]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[15]  Svante Janson,et al.  Random graphs , 2000, ZOR Methods Model. Oper. Res..

[16]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[17]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..

[18]  Jean-Pierre Eckmann,et al.  Curvature of co-links uncovers hidden thematic layers in the World Wide Web , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[20]  Ravi Kumar,et al.  The One-Way Communication Complexity of Hamming Distance , 2008, Theory Comput..

[21]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[22]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.