CoCoS: Fast and Accurate Distributed Triangle Counting in Graph Streams

Given a graph stream, how can we estimate the number of triangles in it using multiple machines with limited storage? Specifically, how should edges be processed and sampled across the machines for rapid and accurate estimation? The count of triangles (i.e., cliques of size three) has proven useful in numerous applications, including anomaly detection, community detection, and link recommendation. For triangle counting in large and dynamic graphs, recent work has focused largely on streaming algorithms and distributed algorithms but little on their combinations for "the best of both worlds". In this work, we propose CoCoS, a fast and accurate distributed streaming algorithm for estimating the counts of global triangles (i.e., all triangles) and local triangles incident to each node. Making one pass over the input stream, COCOS carefully processes and stores the edges across multiple machines so that the redundant use of computational and storage resources is minimized. Compared to baselines, CoCoS is (a) Accurate: giving up to 39X smaller estimation error, (b) Fast: up to 10.4X faster, scaling linearly with the size of the input stream, and (c) Theoretically sound: yielding unbiased estimates.

[1]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[2]  Srikanta Tirthapura,et al.  Parallel triangle counting in massive streaming graphs , 2013, CIKM.

[3]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[4]  Ali Pinar,et al.  A space efficient streaming algorithm for triangle counting using the birthday paradox , 2012, KDD.

[5]  Yongsub Lim,et al.  Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams , 2018, ACM Trans. Knowl. Discov. Data.

[6]  Wook-Shin Han,et al.  TurboGraph++: A Scalable and Fast Graph Analytics System , 2018, SIGMOD Conference.

[7]  Rasmus Pagh,et al.  On the streaming complexity of computing local clustering coefficients , 2013, WSDM.

[8]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[9]  Eric Price,et al.  A Hybrid Sampling Scheme for Triangle Counting , 2016, SODA.

[10]  Yu Sun,et al.  REPT: A Streaming Algorithm of Approximating Global and Local Triangle Counts in Parallel , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[11]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[12]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[13]  Silvio Lattanzi,et al.  Ego-net Community Mining Applied to Friend Suggestion , 2015, Proc. VLDB Endow..

[14]  Ryan A. Rossi,et al.  On Sampling from Massive Graph Streams , 2017, Proc. VLDB Endow..

[15]  Sung-Hyon Myaeng,et al.  PTE: Enumerating Trillion Triangles On Distributed Systems , 2016, KDD.

[16]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[17]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[18]  FaloutsosChristos,et al.  Fast, Accurate and Provable Triangle Counting in Fully Dynamic Graph Streams , 2020 .

[19]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[20]  Yufei Tao,et al.  I/O-Efficient Algorithms on Triangle Listing and Counting , 2014, ACM Trans. Database Syst..

[21]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[22]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[23]  Ramana Rao Kompella,et al.  Graph sample and hold: a framework for big-graph analytics , 2014, KDD.

[24]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[25]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[26]  Christos Faloutsos,et al.  Patterns and anomalies in k-cores of real-world graphs with applications , 2018, Knowledge and Information Systems.

[27]  Christos Faloutsos,et al.  Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation , 2011, Social Network Analysis and Mining.

[28]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[29]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[30]  Luca Becchetti,et al.  Efficient algorithms for large-scale local triangle counting , 2010, TKDD.

[31]  Christos Faloutsos,et al.  Fast, Accurate and Provable Triangle Counting in Fully Dynamic Graph Streams , 2020, ACM Trans. Knowl. Discov. Data.

[32]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[33]  Anthony K. H. Tung,et al.  On Triangulation-based Dense Neighborhood Graphs Discovery , 2010, Proc. VLDB Endow..

[34]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[35]  Mohammad Al Hasan,et al.  Approximate triangle counting algorithms on multi-cores , 2013, 2013 IEEE International Conference on Big Data.

[36]  Rasmus Pagh,et al.  MapReduce Triangle Enumeration With Guarantees , 2014, CIKM.

[37]  Chin-Wan Chung,et al.  An efficient MapReduce algorithm for counting triangles in a very large graph , 2013, CIKM.

[38]  Christos Faloutsos,et al.  Tri-Fly: Distributed Estimation of Global and Local Triangle Counts in Graph Streams , 2018, PAKDD.

[39]  Jinha Kim,et al.  OPT: a new framework for overlapped and parallel triangulation in large-scale graphs , 2014, SIGMOD Conference.

[40]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[41]  Kijung Shin,et al.  WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[42]  Sung-Hyon Myaeng,et al.  Enumerating Trillion Subgraphs On Distributed Systems , 2018, ACM Trans. Knowl. Discov. Data.

[43]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[44]  Christos Faloutsos,et al.  Think Before You Discard: Accurate Triangle Counting in Graph Streams with Deletions , 2018, ECML/PKDD.

[45]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[46]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[47]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..