Efficient Butterfly Counting for Large Bipartite Networks

Bipartite networks are of great importance in many real-world applications. In bipartite networks, butterfly (i.e., a complete 2 x 2 biclique) is the smallest non-trivial cohesive structure and plays a key role. In this paper, we study the problem of efficient counting the number of butterflies in bipartite networks. The most advanced techniques are based on enumerating wedges which is the dominant cost of counting butterflies. Nevertheless, the existing algorithms cannot efficiently handle large-scale bipartite networks. This becomes a bottleneck in large-scale applications. In this paper, instead of the existing layer-priority-based techniques, we propose a vertex-priority-based paradigm BFC-VP to enumerate much fewer wedges; this leads to a significant improvement of the time complexity of the state-of-the-art algorithms. In addition, we present cache-aware strategies to further improve time efficiency while theoretically retaining the time complexity of BFC-VP. Moreover, we also show that our proposed techniques can work efficiently in external and parallel contexts. Our extensive empirical studies demonstrate that the proposed techniques can speed up the state-of-the-art techniques by up to two orders of magnitude for the real datasets.

[1]  Zhaonian Zou,et al.  Bitruss Decomposition of Bipartite Graphs , 2016, DASFAA.

[2]  Daniel C. Fain,et al.  Sponsored search: A brief history , 2006 .

[3]  Martin G. Everett,et al.  Network analysis of 2-mode data , 1997 .

[4]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[5]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[6]  Maria E. Orlowska,et al.  On data allocation with minimum overall communication costs in distributed database design , 1993, Proceedings of ICCI'93: 5th International Conference on Computing and Information.

[7]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[9]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..

[10]  Chen Zhang,et al.  Scalable Top-K Structural Diversity Search , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[11]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[12]  Srikanta Tirthapura,et al.  Butterfly Counting in Bipartite Networks , 2017, KDD.

[13]  Seshadhri Comandur,et al.  A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem , 2016, WWW.

[14]  Garry Robins,et al.  Small Worlds Among Interlocking Directors: Network Structure and Distance in Bipartite Graphs , 2004, Comput. Math. Organ. Theory.

[15]  Michael Ornstein,et al.  Interlocking Directorates in Canada: Intercorporate or Class Alliance? , 1984 .

[16]  Lei Zou,et al.  Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions , 2018, SIGMOD Conference.

[17]  Huy T. Vo,et al.  The More the Merrier: Efficient Multi-Source Graph Traversal , 2014, Proc. VLDB Endow..

[18]  Fan Zhang,et al.  When Engagement Meets Similarity: Efficient (k, r)-Core Computation on Social Networks , 2016, Proc. VLDB Endow..

[19]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[20]  Tamara G. Kolda,et al.  Measuring and modeling bipartite graphs with community structure , 2016, J. Complex Networks.

[21]  James Cheng,et al.  Triangle listing in massive networks , 2012, TKDD.

[22]  Guy E. Blelloch,et al.  Compact representations of separable graphs , 2003, SODA '03.

[23]  Ying Zhang,et al.  A survey of community search over big graphs , 2019, The VLDB Journal.

[24]  Sebastiano Vigna,et al.  Permuting Web Graphs , 2009, WAW.

[25]  Ali Pinar,et al.  ESCAPE: Efficiently Counting All 5-Vertex Subgraphs , 2016, WWW.

[26]  Jia Wang,et al.  Rectangle Counting in Large Bipartite Graphs , 2014, 2014 IEEE International Congress on Big Data.

[27]  Jianmin Wang,et al.  MapDupReducer: detecting near duplicates over massive datasets , 2010, SIGMOD Conference.

[28]  Donald Palmer Broken Ties: Interlocking Directorates and Intercorporate Coordination , 1983 .

[29]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[30]  Ali Pinar,et al.  Peeling Bipartite Networks for Dense Subgraph Discovery , 2016, WSDM.

[31]  Jian Pei,et al.  More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..

[32]  Tore Opsahl Triadic closure in two-mode networks: Redefining the global and local clustering coefficients , 2013, Soc. Networks.

[33]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[34]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[35]  Xuemin Lin,et al.  Efficient (α, β)-core Computation: an Index-based Approach , 2019, WWW.

[36]  Kai Wang,et al.  Efficient Computing of Radius-Bounded k-Cores , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[37]  Christos Faloutsos,et al.  Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[38]  Xuemin Lin,et al.  Speedup Graph Processing by Graph Ordering , 2016, SIGMOD Conference.

[39]  Viktor K. Prasanna,et al.  Optimizing graph algorithms for improved cache performance , 2004, Proceedings 16th International Parallel and Distributed Processing Symposium.

[40]  Robert Erra,et al.  Reordering Very Large Graphs for Fun & Prot , 2015 .

[41]  Fan Zhang,et al.  Discovering Strong Communities with User Engagement and Tie Strength , 2018, DASFAA.

[42]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[43]  Xuemin Lin,et al.  AP-Tree: Efficiently support continuous spatial-keyword queries over stream , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[44]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[45]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[46]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[47]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[48]  Richard J. Fitzgerald,et al.  Scientific collaboration networks , 2018 .

[49]  Giuseppe Ottaviano,et al.  Compressing Graphs and Indexes with Recursive Graph Bisection , 2016, KDD.

[50]  Matthieu Latapy,et al.  Basic notions for the analysis of large two-mode networks , 2008, Soc. Networks.

[51]  Ronald L. Graham,et al.  Bounds for certain multiprocessing anomalies , 1966 .

[52]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[53]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[54]  Fan Zhang,et al.  Efficiently Reinforcing Social Networks over User Engagement and Tie Strength , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[55]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[56]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[57]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[58]  Michael Ornstein,et al.  INTERLOCKING DIRECTORATES IN CANADA: EVIDENCE FROM REPLACEMENT PATTERNS * , 1982 .

[59]  Lijun Chang,et al.  Scalable Subgraph Enumeration in MapReduce , 2015, Proc. VLDB Endow..

[60]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[61]  Vachik S. Dave,et al.  Triangle counting in large networks: a review , 2018, WIREs Data Mining Knowl. Discov..

[62]  Xuemin Lin,et al.  Efficient Probabilistic K-Core Computation on Uncertain Graphs , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[63]  Marta C. González,et al.  Cycles and clustering in bipartite networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[64]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..