Triangle counting in large networks: a review

Counting and enumeration of local topological structures, such as triangles, is an important task for analyzing large real‐life networks. For instance, triangle count in a network is used to compute transitivity—an important property for understanding graph evolution over time. Triangles are also used for various other tasks completed for real‐life networks, including community discovery, link prediction, and spam filtering. The task of triangle counting, though simple, has gained wide attention in recent years from the data mining community. This is due to the fact that most of the existing algorithms for counting triangles do not scale well to very large networks with millions (or even billions) of vertices. To circumvent this limitation, researchers proposed triangle counting methods that approximate the count or run on distributed clusters. In this paper, we discuss the existing methods of triangle counting, ranging from sequential to parallel, single‐machine to distributed, exact to approximate, and off‐line to streaming. We also present experimental results of performance comparison among a set of approximate triangle counting methods built under a unified implementation framework. Finally, we conclude with a discussion of future works in this direction. WIREs Data Mining Knowl Discov 2018, 8:e1226. doi: 10.1002/widm.1226

[1]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[2]  Natasa Przulj,et al.  Graphlet-based measures are suitable for biological network comparison , 2013, Bioinform..

[3]  Jean-Pierre Eckmann,et al.  Curvature of co-links uncovers hidden thematic layers in the World Wide Web , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Mohammad Al Hasan,et al.  Graft: An Efficient Graphlet Counting Method for Large Graph Analysis , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[6]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[7]  Mohammad Al Hasan,et al.  GUISE: Uniform Sampling of Graphlets for Large Graph Analysis , 2012, 2012 IEEE 12th International Conference on Data Mining.

[8]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[9]  Mohammad Al Hasan,et al.  GRAFT: an approximate graphlet counting algorithm for large graph analysis , 2012, CIKM.

[10]  Ravi Kumar,et al.  Counting Graphlets: Space vs Time , 2017, WSDM.

[11]  A. Folkesson IT and society , 2013 .

[12]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[13]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[14]  Mohammad Al Hasan,et al.  Sampling Triples from Restricted Networks using MCMC Strategy , 2014, CIKM.

[15]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[16]  Jinha Kim,et al.  OPT: a new framework for overlapped and parallel triangulation in large-scale graphs , 2014, SIGMOD Conference.

[17]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[18]  Chin-Wan Chung,et al.  An efficient MapReduce algorithm for counting triangles in a very large graph , 2013, CIKM.

[19]  Donald F. Towsley,et al.  Efficiently Estimating Motif Statistics of Large Networks , 2013, TKDD.

[20]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[21]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[22]  Xuelong Li,et al.  Image Categorization by Learning a Propagated Graphlet Path , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[24]  Charu C. Aggarwal,et al.  Evolutionary Network Analysis , 2014, ACM Comput. Surv..

[25]  Thomas Schank,et al.  Algorithmic Aspects of Triangle-Based Network Analysis , 2007 .

[26]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[27]  Jianguo Lu,et al.  Efficient Estimation of Triangles in Very Large Graphs , 2016, CIKM.

[28]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[29]  Mohammad Al Hasan,et al.  Methods and Applications of Network Sampling , 2016 .

[30]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[31]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[32]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[33]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[34]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[35]  Huma Lodhi,et al.  Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques , 2010 .

[36]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[37]  Mohammad Al Hasan,et al.  FS3: A sampling based method for top-k frequent subgraph mining , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[38]  Mohammad Al Hasan,et al.  Approximate triangle counting algorithms on multi-cores , 2013, 2013 IEEE International Conference on Big Data.

[39]  Harvey J. Greenberg,et al.  Optimization Challenges in Complex, Networked and Risky Systems , 2016 .

[40]  Dorothea Wagner,et al.  Approximating Clustering Coefficient and Transitivity , 2005, J. Graph Algorithms Appl..

[41]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[42]  Yuval Shavitt,et al.  RAGE - A rapid graphlet enumerator for large networks , 2012, Comput. Networks.

[43]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[44]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[45]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[46]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[47]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[48]  R. Luce,et al.  A method of matrix analysis of group structure , 1949, Psychometrika.

[49]  L. Brun,et al.  Graph kernels in chemoinformatics , 2015 .

[50]  Irene Finocchi,et al.  Clique Counting in MapReduce , 2014, ACM J. Exp. Algorithmics.

[51]  Ali Pinar,et al.  ESCAPE: Efficiently Counting All 5-Vertex Subgraphs , 2016, WWW.

[52]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[54]  François Le Gall,et al.  Powers of tensors and fast matrix multiplication , 2014, ISSAC.