The input/output complexity of triangle enumeration

We consider the well-known problem of enumerating all triangles of an undirected graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the number of edges, M<E the size of internal memory, and B the block size. The best results obtained previously are sortE3/2) I/Os (Dementiev, PhD thesis 2006) and O(E2/MB) I/Os (Hu et al., SIGMOD 2013), where sort(n) denotes the number of I/Os for sorting n items. We improve the I/O complexity to O(E3/2/(√MB) expected I/Os, which improves the previous bounds by a factor min(√E/M),√M). Our algorithm is cache-oblivious and also I/O optimal: We show that any algorithm enumerating t distinct triangles must always use Ω(√MB) I/Os, and there are graphs for which t=Ω(E3/2). Finally, we give a deterministic cache-aware algorithm using O(E3/2/√MB) I/Os assuming M > Ec for a constant c > 0. Our results are based on a new color coding technique, which may be of independent interest.

[1]  Noshir S. Contractor,et al.  Is a friend a friend?: investigating the structure of friendship networks in virtual worlds , 2010, CHI Extended Abstracts.

[2]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[3]  Yufei Tao,et al.  I/O-Efficient Algorithms on Triangle Listing and Counting , 2014, ACM Trans. Database Syst..

[4]  William Kent,et al.  ASlMPLE GUIDE TO FIVE NORMAL FORMS IN RELATIONAL DATABASE THEORY , 2000 .

[5]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[6]  Noga Alon,et al.  Simple construction of almost k-wise independent random variables , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[7]  Noga Alon,et al.  Simple Construction of Almost k-wise Independent Random Variables , 1992, Random Struct. Algorithms.

[8]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[9]  James Cheng,et al.  Triangle listing in massive networks , 2012, TKDD.

[10]  Jeffrey D. Ullman,et al.  Upper and Lower Bounds on the Cost of a Map-Reduce Computation , 2012, Proc. VLDB Endow..

[11]  Roman Dementiev Algorithm engineering for large data sets: hardware, software, algorithms , 2007 .

[12]  Christoph M. Hoffmann,et al.  A graph-constructive approach to solving systems of geometric constraints , 1997, TOGS.

[13]  Cynthia A. Phillips,et al.  Why do simple algorithms for triangle enumeration work in the real world? , 2014, Internet Math..

[14]  Peter Bro Miltersen,et al.  On showing lower bounds for external-memory computational geometry problems , 1998, External Memory Algorithms.

[15]  Gerth Stølting Brodal,et al.  On the limits of cache-obliviousness , 2003, STOC '03.

[16]  Francesco Silvestri,et al.  On the limits of cache-oblivious rational permutations , 2008, Theor. Comput. Sci..

[17]  Emanuele Viola,et al.  3SUM, 3XOR, Triangles , 2013, Electron. Colloquium Comput. Complex..

[18]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Jeffrey Scott Vitter,et al.  Algorithms and Data Structures for External Memory , 2008, Found. Trends Theor. Comput. Sci..

[20]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[21]  Rasmus Pagh,et al.  The Input/Output Complexity of Sparse Matrix Multiplication , 2014, ESA.

[22]  Bruno Menegola An External Memory Algorithm for Listing Triangles , 2010 .

[23]  Francesco Silvestri Subgraph Enumeration in Massive Graphs , 2014, ArXiv.

[24]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..