REPT: A Streaming Algorithm of Approximating Global and Local Triangle Counts in Parallel

Recently, considerable efforts have been devoted to approximately computing the global and local (i.e., incident to each node) triangle counts of a large graph stream represented as a sequence of edges. Existing approximate triangle counting algorithms rely on sampling techniques to reduce the computational cost. However, their estimation errors are significantly determined by the covariance between sampled triangles. Moreover, little attention has been paid to developing parallel one-pass streaming algorithms that can be used to fast and approximately count triangles on a multi-core machine or a cluster of machines. To solve these problems, we develop a novel parallel method REPT to significantly reduce the covariance (even completely eliminate the covariance for some cases) between sampled triangles. We theoretically prove that REPT is more accurate than parallelizing existing triangle count estimation algorithms in a direct manner. In addition, we also conduct extensive experiments on a variety of real-world graphs, and the results demonstrate that our method REPT is several times more accurate than state-of-the-art methods.

[1]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[2]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[3]  Yongsub Lim,et al.  MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams , 2015, KDD.

[4]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[5]  Vachik S. Dave,et al.  Triangle counting in large networks: a review , 2018, WIREs Data Mining Knowl. Discov..

[6]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[7]  Danyel Fisher,et al.  Visualizing the Signatures of Social Roles in Online Discussion Groups , 2007, J. Soc. Struct..

[8]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[9]  Luca Becchetti,et al.  Efficient algorithms for large-scale local triangle counting , 2010, TKDD.

[10]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[11]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[12]  Rasmus Pagh,et al.  The input/output complexity of triangle enumeration , 2013, PODS.

[13]  Pararth Shah,et al.  Ringo: Interactive Graph Analytics on Big-Memory Machines , 2015, SIGMOD Conference.

[14]  Ryan A. Rossi,et al.  On Sampling from Massive Graph Streams , 2017, Proc. VLDB Endow..

[15]  Zhenguo Li,et al.  VENUS: A System for Streamlined Graph Computation on a Single PC , 2016, IEEE Transactions on Knowledge and Data Engineering.

[16]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[17]  Rasmus Pagh,et al.  On the streaming complexity of computing local clustering coefficients , 2013, WSDM.

[18]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[19]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[20]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[21]  Sung-Hyon Myaeng,et al.  PTE: Enumerating Trillion Triangles On Distributed Systems , 2016, KDD.

[22]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[23]  Ramana Rao Kompella,et al.  Graph sample and hold: a framework for big-graph analytics , 2014, KDD.

[24]  Yu Wang,et al.  NXgraph: An efficient graph processing system on a single machine , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[25]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[26]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[27]  Jinha Kim,et al.  OPT: a new framework for overlapped and parallel triangulation in large-scale graphs , 2014, SIGMOD Conference.

[28]  Sofya Vorotnikova,et al.  Better Algorithms for Counting Triangles in Data Streams , 2016, PODS.

[29]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[30]  Ben Y. Zhao,et al.  Uncovering social network sybils in the wild , 2011, IMC '11.

[31]  Christos Faloutsos,et al.  HEigen: Spectral Analysis for Billion-Scale Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[32]  Srikanta Tirthapura,et al.  Parallel triangle counting in massive streaming graphs , 2013, CIKM.

[33]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[34]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[35]  Jure Leskovec,et al.  Image Labeling on a Network: Using Social-Network Metadata for Image Classification , 2012, ECCV.

[36]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.

[37]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[38]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Sunmin Lee,et al.  FURL: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams , 2019, Data Mining and Knowledge Discovery.

[40]  Jing Tao,et al.  Approximately Counting Triangles in Large Graph Streams Including Edge Duplicates with a Fixed Memory Usage , 2017, Proc. VLDB Endow..

[41]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[42]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[43]  Eiko Yoneki,et al.  PDTL: Parallel and Distributed Triangle Listing for Massive Graphs , 2015, 2015 44th International Conference on Parallel Processing.

[44]  Jean-Pierre Eckmann,et al.  Curvature of co-links uncovers hidden thematic layers in the World Wide Web , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[46]  Rasmus Pagh,et al.  MapReduce Triangle Enumeration With Guarantees , 2014, CIKM.

[47]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[48]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[49]  Andrew McGregor,et al.  Graph stream algorithms: a survey , 2014, SGMD.

[50]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[51]  Ali Pinar,et al.  Counting triangles in real-world graph streams: Dealing with repeated edges and time windows , 2013, 2015 49th Asilomar Conference on Signals, Systems and Computers.

[52]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[53]  F. Graybill,et al.  Combining Unbiased Estimators , 1959 .

[54]  Chin-Wan Chung,et al.  An efficient MapReduce algorithm for counting triangles in a very large graph , 2013, CIKM.

[55]  Bin Wu,et al.  Counting Triangles in Large Graphs by Random Sampling , 2016, IEEE Transactions on Knowledge and Data Engineering.

[56]  Tamara G. Kolda,et al.  Wedge sampling for computing clustering coefficients and triangle counts on large graphs † , 2013, Stat. Anal. Data Min..