A Space-Efficient Parallel Algorithm for Counting Exact Triangles in Massive Networks

Finding the number of triangles in a network (graph) is an important problem in mining and analysis of complex networks. Massive networks emerging from numerous application areas pose a significant challenge in network analytics since these networks consist of millions, or even billions, of nodes and edges. Such massive networks necessitate the development of efficient parallel algorithms. There exist several MapReduce and an only MPI (Message Passing Interface) based distributed-memory parallel algorithms for counting triangles. MapReduce based algorithms generate prohibitively large intermediate data. The MPI based algorithm can work on quite large networks, however, the overlapping partitions employed by the algorithm limit its capability to deal with very massive networks. In this paper, we present a space-efficient MPI based parallel algorithm for counting exact number of triangles in massive networks. The algorithm divides the network into non-overlapping partitions. Our results demonstrate up to 25-fold space saving over the algorithm with overlapping partitions. This space efficiency allows the algorithm to deal with networks which are 25 times larger. We present a novel approach that reduces communication cost drastically (up to 90%) leading to both a space-and runtime-efficient algorithm. Our adaptation of a parallel partitioning scheme by computing a novel weight function adds further to the efficiency of the algorithm. Denoting average degree of nodes and the number of partitions by d and P, respectively, our algorithm achieves up to O(P2)-factor space efficiency over existing MapReduce based algorithms and up to d-factor (approx.) over the algorithm with overlapping partitioning.

[1]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[2]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[4]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[5]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[6]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[7]  Rasmus Pagh,et al.  MapReduce Triangle Enumeration With Guarantees , 2014, CIKM.

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Jean-Pierre Eckmann,et al.  Curvature of co-links uncovers hidden thematic layers in the World Wide Web , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[11]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.

[12]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[13]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[14]  Josep-Lluís Larriba-Pey,et al.  Shaping communities out of triangles , 2012, CIKM.

[15]  Tamara G. Kolda,et al.  Counting Triangles in Massive Graphs with MapReduce , 2013, SIAM J. Sci. Comput..

[16]  Chin-Wan Chung,et al.  An efficient MapReduce algorithm for counting triangles in a very large graph , 2013, CIKM.

[17]  Madhav V. Marathe,et al.  Generation and analysis of large synthetic social contact networks , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).

[18]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[19]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[20]  Thomas Schank,et al.  Algorithmic Aspects of Triangle-Based Network Analysis , 2007 .