Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning

In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model [12]. The key idea of our algorithm is to combine the sampling algorithm of [31,32] and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in [1], treating each set appropriately. We obtain a running time \(O \left( m + \frac{m^{3/2} \Delta \log{n} }{t \epsilon^2} \right)\) and an e approximation (multiplicative error), where n is the number of vertices, m the number of edges and Δ the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage \(O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)\) and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.

[1]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[2]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[3]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[4]  Danyel Fisher,et al.  Visualizing the Signatures of Social Roles in Online Discussion Groups , 2007, J. Soc. Struct..

[5]  F. Heider Attitudes and cognitive organization. , 1946, The Journal of psychology.

[6]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[7]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[8]  Christos Faloutsos,et al.  Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation , 2011, Social Network Analysis and Mining.

[9]  Michael G. Neubauer,et al.  Sum of squares of degrees in a graph , 2008, 0808.2234.

[10]  Christos Faloutsos,et al.  Spectral Counting of Triangles in Power-Law Networks via Element-Wise Sparsification , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[11]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[12]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[13]  Jean-Pierre Eckmann,et al.  Curvature of co-links uncovers hidden thematic layers in the World Wide Web , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[15]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[16]  Andrei Z. Broder,et al.  Workshop on Algorithms and Models for the Web Graph , 2007, WAW.

[17]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[18]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[19]  Anthony Bonato,et al.  Models of Online Social Networks , 2009, Internet Math..

[20]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2012, Internet Math..

[21]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Dorothea Wagner,et al.  Approximating Clustering Coefficient and Transitivity , 2005, J. Graph Algorithms Appl..

[23]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[24]  J. Wrench Table errata: The art of computer programming, Vol. 2: Seminumerical algorithms (Addison-Wesley, Reading, Mass., 1969) by Donald E. Knuth , 1970 .

[25]  Kevin Lewis,et al.  Beyond and Below Racial Homophily: ERG Models of a Friendship Network Documented on Facebook1 , 2010, American Journal of Sociology.

[26]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[27]  H. Avron Counting Triangles in Large Graphs using Randomized Matrix Trace Estimation , 2010 .

[28]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[29]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[30]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[32]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[33]  Jacques Rougemont,et al.  DNA microarray data and contextual analysis of correlation graphs , 2003, BMC Bioinformatics.

[34]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[35]  Mihalis Yannakakis,et al.  The Clique Problem for Planar Graphs , 1981, Inf. Process. Lett..

[36]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.

[37]  Avner Magen,et al.  Near Optimal Dimensionality Reductions That Preserve Volumes , 2008, APPROX-RANDOM.

[38]  Van H. Vu,et al.  Concentration of Multivariate Polynomials and Its Applications , 2000, Comb..

[39]  Guy Bresler,et al.  Mixing Time of Exponential Random Graphs , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[40]  Fan Chung Graham,et al.  The Spectra of Random Graphs with Given Expected Degrees , 2004, Internet Math..

[41]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[42]  F. Chung,et al.  Complex Graphs and Networks , 2006 .

[43]  C. V. Eynden,et al.  A proof of a conjecture of Erdös , 1969 .

[44]  H. Chernoff A Note on an Inequality Involving the Normal Distribution , 1981 .

[45]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[46]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[47]  R. Ahlswede,et al.  Graphs with maximal number of adjacent pairs of edges , 1978 .

[48]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[49]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[50]  Christoph M. Hoffmann,et al.  A graph-constructive approach to solving systems of geometric constraints , 1997, TOGS.

[51]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[52]  Colin Cooper,et al.  Randomization and Approximation Techniques in Computer Science , 1999, Lecture Notes in Computer Science.

[53]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[54]  Joan Feigenbaum,et al.  On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[55]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[56]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[57]  Mihail N. Kolountzakis,et al.  Triangle Sparsifiers , 2011, J. Graph Algorithms Appl..

[58]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[59]  Mihail N. Kolountzakis,et al.  Approximate Triangle Counting , 2009, ArXiv.

[60]  Ana Paula Appel,et al.  Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations , 2010, SDM.

[61]  Charalampos E. Tsourakakis Large Scale Graph Mining with MapReduce: Diameter Estimation and Eccentricity Plots of Massive Graphs with Mining Applications , 2012, SNA-KDD 2012.

[62]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[63]  Van H. Vu On the concentration of multivariate polynomials with small expectation , 1999, Random Struct. Algorithms.

[64]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .