Triadic Measures on Graphs: The Power of Wedge Sampling

Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associated algorithms can be extremely expensive. We propose a new method based on wedge sampling. This versatile technique allows for the fast and accurate approximation of all current variants of clustering coefficients and enables rapid uniform sampling of the triangles of a graph. Our methods come with provable and practical time-approximation tradeoffs for all computations. We provide extensive results that show our methods are orders of magnitude faster than the state-of-the-art, while providing nearly the accuracy of full enumeration. Our results will enable more wide-scale adoption of triadic measures for analysis of extremely large graphs, as demonstrated on several real-world examples.

[1]  Noshir S. Contractor,et al.  Is a friend a friend?: investigating the structure of friendship networks in virtual worlds , 2010, CHI Extended Abstracts.

[2]  Tamara G. Kolda,et al.  Degree relations of triangles in real-world networks and graph models , 2012, CIKM.

[3]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[4]  Tamara G. Kolda,et al.  Fast Triangle Counting through Wedge Sampling , 2012, ArXiv.

[5]  Christos Faloutsos,et al.  Spectral Counting of Triangles in Power-Law Networks via Element-Wise Sparsification , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[6]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[7]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[8]  Priya Mahadevan,et al.  Systematic topology analysis and generation using degree correlations , 2006, SIGCOMM 2006.

[9]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[10]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[11]  Mihail N. Kolountzakis,et al.  Triangle Sparsifiers , 2011, J. Graph Algorithms Appl..

[12]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[13]  Dorothea Wagner,et al.  Approximating Clustering Coefficient and Transitivity , 2005, J. Graph Algorithms Appl..

[14]  A. Portes Social Capital: Its Origins and Applications in Modern Sociology , 1998 .

[15]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[16]  Sung-Ryul Kim,et al.  Improved Sampling for Triangle Counting with MapReduce , 2011, ICHIT.

[17]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[18]  R. Burt Structural Holes and Good Ideas1 , 2004, American Journal of Sociology.

[19]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[20]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[21]  R. Burt Secondhand Brokerage: Evidence On The Importance Of Local Structure For Managers, Bankers, And Analysts , 2007 .

[22]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[23]  Ravi Kumar,et al.  An improved data stream algorithm for frequency moments , 2004, SODA '04.

[24]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[25]  Priya Mahadevan,et al.  Orbis: rescaling degree correlations to generate annotated internet topologies , 2007, SIGCOMM '07.

[26]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[27]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[28]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[29]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[30]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..

[31]  Jean-Pierre Eckmann,et al.  Curvature of co-links uncovers hidden thematic layers in the World Wide Web , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[33]  J. Coleman,et al.  Social Capital in the Creation of Human Capital , 1988, American Journal of Sociology.

[34]  Stanley Wasserman,et al.  Testing Multitheoretical, Multilevel Hypotheses About Organizational Networks: An Analytic Framework and Empirical Example , 2006 .

[35]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[37]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[38]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[39]  H. Avron Counting Triangles in Large Graphs using Randomized Matrix Trace Estimation , 2010 .

[40]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.