Retrieving Top Weighted Triangles in Graphs

Pattern counting in graphs is a fundamental primitive for many network analysis tasks, and there are several methods for scaling subgraph counting to large graphs. Many real-world networks have a notion of strength of connection between nodes, which is often modeled by a weighted graph, but existing scalable algorithms for pattern mining are designed for unweighted graphs. Here, we develop deterministic and random sampling algorithms that enable the fast discovery of the 3-cliques (triangles) of largest weight, as measured by the generalized mean of the triangle's edge weights. For example, one of our proposed algorithms can find the top-1000 weighted triangles of a weighted graph with billions of edges in thirty seconds on a commodity server, which is orders of magnitude faster than existing "fast" enumeration schemes. Our methods open the door towards scalable pattern mining in weighted graphs.

[1]  Lorenzo De Stefani,et al.  Tiered sampling: An efficient method for approximate counting sparse motifs in massive graph streams , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[2]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[3]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[4]  Pablo Robles-Granda,et al.  Sampling of Attributed Networks from Hierarchical Generative Models , 2016, KDD.

[5]  Tamara G. Kolda,et al.  Degree relations of triangles in real-world networks and graph models , 2012, CIKM.

[6]  Vachik S. Dave,et al.  Triangle counting in large networks: a review , 2018, WIREs Data Mining Knowl. Discov..

[7]  Mohammad Al Hasan,et al.  Approximate triangle counting algorithms on multi-cores , 2013, 2013 IEEE International Conference on Big Data.

[8]  K. Kaski,et al.  Intensity and coherence of motifs in weighted complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Jon M. Kleinberg,et al.  Simplicial closure and higher-order link prediction , 2018, Proceedings of the National Academy of Sciences.

[10]  Tamara G. Kolda,et al.  Degree Relations of Triangles in Real-world Networks and Models , 2012, arXiv.org.

[11]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[12]  Mohammad Al Hasan,et al.  Sampling Triples from Restricted Networks using MCMC Strategy , 2014, CIKM.

[13]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..

[14]  Mihail N. Kolountzakis,et al.  Triangle Sparsifiers , 2011, J. Graph Algorithms Appl..

[15]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[16]  Noshir S. Contractor,et al.  Is a friend a friend?: investigating the structure of friendship networks in virtual worlds , 2010, CHI Extended Abstracts.

[17]  Jure Leskovec,et al.  The Local Closure Coefficient: A New Perspective On Network Clustering , 2019, WSDM.

[18]  Edoardo M. Airoldi,et al.  Graphlet decomposition of a weighted network , 2012, AISTATS.

[19]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[20]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[21]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Jari Saramäki,et al.  Characterizing Motifs in Weighted Complex Networks , 2005 .

[23]  Karl Rohe,et al.  The blessing of transitivity in sparse and stochastic networks , 2013, 1307.2302.

[24]  Ramana Rao Kompella,et al.  Graph sample and hold: a framework for big-graph analytics , 2014, KDD.

[25]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[26]  Yang Xu,et al.  Video telephony for end-consumers: measurement study of Google+, iChat and Skype , 2014, TNET.

[27]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[28]  Jonathan W. Berry,et al.  Listing triangles in expected linear time on a class of power law graphs. , 2010 .

[29]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[30]  Ata Turk,et al.  Edge-Based Wedge Sampling to Estimate Triangle Counts in Very Large Graphs , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[31]  Dana Ron,et al.  Approximately Counting Triangles in Sublinear Time , 2017, SIAM J. Comput..

[32]  Ravi Kumar,et al.  Counting Graphlets: Space vs Time , 2017, WSDM.

[33]  Maximilien Danisch,et al.  Listing k-cliques in Sparse Real-World Graphs* , 2018, WWW.

[34]  R. Burt Secondhand Brokerage: Evidence On The Importance Of Local Structure For Managers, Bankers, And Analysts , 2007 .

[35]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[36]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Priya Mahadevan,et al.  Orbis: rescaling degree correlations to generate annotated internet topologies , 2007, SIGCOMM '07.

[38]  Dorothea Wagner,et al.  Approximating Clustering Coefficient and Transitivity , 2005, J. Graph Algorithms Appl..

[39]  Peter Donnelly,et al.  Superfamilies of Evolved and Designed Networks , 2004 .

[40]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[41]  Joel Nishimura,et al.  Configuring Random Graph Models with Fixed Degree Sequences , 2016, SIAM Rev..

[42]  H. Avron Counting Triangles in Large Graphs using Randomized Matrix Trace Estimation , 2010 .

[43]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[44]  Santiago Segarra,et al.  Graph-based Semi-Supervised & Active Learning for Edge Flows , 2019, KDD.

[45]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[46]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[47]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[48]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[49]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[50]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[51]  Barbara S. Lawrence,et al.  Organizational Reference Groups: A Missing Perspective on Social Context , 2006, Organ. Sci..

[52]  A. Vespignani,et al.  The architecture of complex weighted networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[53]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[54]  Jack Hessel,et al.  Science, AskScience, and BadScience: On the Coexistence of Highly Related Communities , 2016, ICWSM.

[55]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[56]  Seshadhri Comandur,et al.  A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem , 2016, WWW.

[57]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[58]  Stanley Wasserman,et al.  Testing Multitheoretical, Multilevel Hypotheses About Organizational Networks: An Analytic Framework and Empirical Example , 2006 .

[59]  Tamara G. Kolda,et al.  Wedge sampling for computing clustering coefficients and triangle counts on large graphs † , 2013, Stat. Anal. Data Min..

[60]  Tore Opsahl,et al.  Clustering in weighted networks , 2009, Soc. Networks.

[61]  Austin R. Benson,et al.  Sampling Methods for Counting Temporal Motifs , 2019, WSDM.

[62]  Danai Koutra,et al.  RolX: Role Extraction and Mining in Large Networks , 2011 .

[63]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[64]  Ryan A. Rossi,et al.  Role Discovery in Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[65]  Jianguo Lu,et al.  Efficient Estimation of Triangles in Very Large Graphs , 2016, CIKM.

[66]  Kuai Xu,et al.  Behavior Analysis of Internet Traffic via Bipartite Graphs and One-Mode Projections , 2014, IEEE/ACM Trans. Netw..