Why do simple algorithms for triangle enumeration work in the real world?

Triangle enumeration is a fundamental graph operation. Despite the lack of provably efficient (linear, or slightly super-linear) worst-case algorithms for this problem, practitioners run simple, efficient heuristics to find all triangles in graphs with millions of vertices. How are these heuristics exploiting the structure of these special graphs to provide major speedups in running time? We study one of the most prevalent algorithms used by practitioners. A trivial algorithm enumerates all paths of length 2, and checks if each such path is incident to a triangle. A good heuristic is to enumerate only those paths of length 2 where the middle vertex has the lowest degree. It is easily implemented and is empirically known to give remarkable speedups over the trivial algorithm. We study the behavior of this algorithm over graphs with heavy-tailed degree distributions, a defining feature of real-world graphs. The erased configuration model (ECM) efficiently generates a graph with asymptotically (almost) any desired degree sequence. We show that the expected running time of this algorithm over the distribution of graphs created by the ECM is controlled by the l4/3-norm of the degree sequence. As a corollary of our main theorem, we prove expected linear-time performance for degree sequences following a power law with exponent α ≥ 7/3, and non-trivial speedup whenever α ∈ (2,3).

[1]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[2]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[3]  Marek Chrobak,et al.  Planar Orientations with Low Out-degree and Compaction of Adjacency Matrices , 1991, Theor. Comput. Sci..

[4]  Nicholas C. Wormald,et al.  The asymptotic connectivity of labelled regular graphs , 1981, J. Comb. Theory B.

[5]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[7]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[8]  J. Coleman,et al.  Social Capital in the Creation of Human Capital , 1988, American Journal of Sociology.

[9]  Danilo Sergi Random graph model with power-law distributed triangle subgraphs. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Fan Chung Graham,et al.  A Random Graph Model for Power Law Graphs , 2001, Exp. Math..

[11]  A. Martin-Löf,et al.  Generating Simple Random Graphs with Prescribed Degree Distribution , 2006, 1509.06985.

[12]  Edward A. Bender,et al.  The Asymptotic Number of Labeled Graphs with Given Degree Sequences , 1978, J. Comb. Theory A.

[13]  R. Burt Secondhand Brokerage: Evidence On The Importance Of Local Structure For Managers, Bankers, And Analysts , 2007 .

[14]  Christoph M. Hoffmann,et al.  A graph-constructive approach to solving systems of geometric constraints , 1997, TOGS.

[15]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  R. Burt Structural Holes and Good Ideas1 , 2004, American Journal of Sociology.

[17]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[18]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[19]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[20]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[21]  Bruce A. Reed,et al.  The Size of the Giant Component of a Random Graph with a Given Degree Sequence , 1998, Combinatorics, Probability and Computing.

[22]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[23]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[25]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[26]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[27]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[28]  Christos H. Papadimitriou,et al.  On the Eigenvalue Power Law , 2002, RANDOM.

[29]  Béla Bollobás,et al.  A Probabilistic Proof of an Asymptotic Formula for the Number of Labelled Regular Graphs , 1980, Eur. J. Comb..

[30]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[31]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[32]  F. Chung,et al.  Eigenvalues of Random Power law Graphs , 2003 .

[33]  Noshir S. Contractor,et al.  Is a friend a friend?: investigating the structure of friendship networks in virtual worlds , 2010, CHI Extended Abstracts.

[34]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[35]  Alon Itai,et al.  Finding a Minimum Circuit in a Graph , 1978, SIAM J. Comput..

[36]  A. Portes Social Capital: Its Origins and Applications in Modern Sociology , 1998 .