On Asymptotic Cost of Triangle Listing in Random Graphs

Triangle listing has been a long-standing problem, with many heuristics, bounds, and experimental results, but not much asymptotically accurate complexity analysis. To address this issue, we introduce a novel stochastic framework, based on Glivenko-Cantelli results for functions of order statistics, that allows modeling cost of in-memory triangle enumeration in families of random graphs. Unlike prior work that usually studies the O(.) notation, we derive the exact limits of CPU complexity of all vertex/edge iterators under arbitrary acyclic orientations as graph size n → ∞. These results are obtained in simple closed form as functions of the degree distribution. This allows us to establish optimal orientations for all studied algorithms, compare them to each other, and discover the best technique within each class.

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  Cynthia A. Phillips,et al.  Why do simple algorithms for triangle enumeration work in the real world? , 2014, Internet Math..

[3]  Edward A. Bender,et al.  The Asymptotic Number of Labeled Graphs with Given Degree Sequences , 1978, J. Comb. Theory A.

[4]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  L. Gordon,et al.  Two moments su ce for Poisson approx-imations: the Chen-Stein method , 1989 .

[6]  Jinha Kim,et al.  OPT: a new framework for overlapped and parallel triangulation in large-scale graphs , 2014, SIGMOD Conference.

[7]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[8]  Alexandr V. Kostochka,et al.  Acyclic and oriented chromatic numbers of graphs , 1997, J. Graph Theory.

[9]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[10]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[11]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Anthony K. H. Tung,et al.  On Triangulation-based Dense Neighborhood Graphs Discovery , 2010, Proc. VLDB Endow..

[13]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[14]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[15]  Ulrik Brandes,et al.  Triangle Listing Algorithms: Back from the Diversion , 2014, ALENEX.

[16]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[17]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[18]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[19]  F. Chung,et al.  Connected Components in Random Graphs with Given Expected Degree Sequences , 2002 .

[20]  Eiko Yoneki,et al.  PDTL: Parallel and Distributed Triangle Listing for Massive Graphs , 2015, 2015 44th International Conference on Parallel Processing.

[21]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[22]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[23]  Di Xiao,et al.  On Efficient External-Memory Triangle Listing , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[24]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[25]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[26]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[27]  Ronald W. Wolff,et al.  Stochastic Modeling and the Theory of Queues , 1989 .

[28]  Ben Y. Zhao,et al.  Uncovering social network sybils in the wild , 2011, IMC '11.

[29]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[30]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[31]  Adam Welc,et al.  Fast In-Memory Triangle Listing for Large Real-World Graphs , 2014, SNAKDD'14.

[32]  Christoph M. Hoffmann,et al.  A graph-constructive approach to solving systems of geometric constraints , 1997, TOGS.

[33]  W. R. van Zwet,et al.  A Strong Law for Linear Functions of Order Statistics , 1980 .

[34]  Vladimir Batagelj,et al.  Short cycle connectivity , 2007, Discret. Math..

[35]  Roman Dementiev,et al.  Algorithm engineering for large data sets , 2007 .

[36]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[37]  Persi Diaconis,et al.  A Sequential Importance Sampling Algorithm for Generating Random Graphs with Prescribed Degrees , 2011, Internet Math..

[38]  Jon A. Wellner,et al.  A Glivenko-Cantelli Theorem and Strong Laws of Large Numbers for Functions of Order Statistics , 1977 .

[39]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[40]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[41]  Piotr Sankowski,et al.  Algorithmic Complexity of Power Law Networks , 2015, SODA.

[42]  R. Arratia,et al.  How likely is an i.i.d. degree sequence to be graphical , 2005, math/0504096.

[43]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[44]  Sahar Asadi,et al.  Kavosh: a new algorithm for finding network motifs , 2009, BMC Bioinformatics.