Vertex Priority Based Butterfly Counting for Large-scale Bipartite Networks

Bipartite networks are of great importance in many real-world applications. In bipartite networks, butterfly (i.e., a complete 2 x 2 biclique) is the smallest non-trivial cohesive structure and plays a key role. In this paper, we study the problem of efficiently counting the number of butterflies in a bipartite network. This problem has been recently studied. The most efficient existing techniques are based on enumerating wedges which is the dominant cost of counting butterflies. Nevertheless, the existing algorithms can hardly handle large-scale bipartite networks. This becomes a bottleneck in large-scale applications. In this paper, instead of the existing layer-priority-based techniques, we propose a vertex-priority-based paradigm BFC-VP to enumerate much fewer wedges; this leads to a significant improvement of the time complexity of the state-of-the-art algorithm. Moreover, we also present cache-aware strategies to further improve the time efficiency while theoretically retaining the time complexity of BFC-VP. These not only resolve the issue that the existing techniques cannot finish on some real datasets but also extensive empirical studies demonstrate that our techniques can speed up the state-of-the-art techniques by up to two orders of magnitude for the real datasets when the existing techniques can finish.

[1]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[2]  Jianmin Wang,et al.  MapDupReducer: detecting near duplicates over massive datasets , 2010, SIGMOD Conference.

[3]  Chen Zhang,et al.  Scalable Top-K Structural Diversity Search , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[4]  Ali Pinar,et al.  ESCAPE: Efficiently Counting All 5-Vertex Subgraphs , 2016, WWW.

[5]  Lijun Chang,et al.  Scalable Subgraph Enumeration in MapReduce , 2015, Proc. VLDB Endow..

[6]  Lei Zou,et al.  Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions , 2018, SIGMOD Conference.

[7]  Zhaonian Zou,et al.  Bitruss Decomposition of Bipartite Graphs , 2016, DASFAA.

[8]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[9]  Fan Zhang,et al.  Discovering Strong Communities with User Engagement and Tie Strength , 2018, DASFAA.

[10]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[11]  Xuemin Lin,et al.  Efficient Probabilistic K-Core Computation on Uncertain Graphs , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[12]  Ronald L. Graham,et al.  Bounds for certain multiprocessing anomalies , 1966 .

[13]  Marta C. González,et al.  Cycles and clustering in bipartite networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[15]  Xuemin Lin,et al.  Efficient (α, β)-core Computation: an Index-based Approach , 2019, WWW.

[16]  Sebastiano Vigna,et al.  Permuting Web Graphs , 2009, WAW.

[17]  Daniel C. Fain,et al.  Sponsored search: A brief history , 2006 .

[18]  Kai Wang,et al.  Efficient Computing of Radius-Bounded k-Cores , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[19]  Donald Palmer Broken Ties: Interlocking Directorates and Intercorporate Coordination , 1983 .

[20]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[21]  Richard J. Fitzgerald,et al.  Scientific collaboration networks , 2018 .

[22]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..

[23]  James Cheng,et al.  Triangle listing in massive networks , 2012, TKDD.

[24]  Fan Zhang,et al.  When Engagement Meets Similarity: Efficient (k, r)-Core Computation on Social Networks , 2016, Proc. VLDB Endow..

[25]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[26]  Fan Zhang,et al.  Efficiently Reinforcing Social Networks over User Engagement and Tie Strength , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[27]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[28]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[29]  Tamara G. Kolda,et al.  Measuring and modeling bipartite graphs with community structure , 2016, J. Complex Networks.

[30]  Jian Pei,et al.  More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..

[31]  Xuemin Lin,et al.  AP-Tree: Efficiently support continuous spatial-keyword queries over stream , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[32]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[33]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[34]  Michael Ornstein,et al.  INTERLOCKING DIRECTORATES IN CANADA: EVIDENCE FROM REPLACEMENT PATTERNS * , 1982 .

[35]  Srikanta Tirthapura,et al.  Butterfly Counting in Bipartite Networks , 2017, KDD.

[36]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[37]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[38]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[40]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[41]  Robert Erra,et al.  Reordering Very Large Graphs for Fun & Prot , 2015 .

[42]  Ying Zhang,et al.  A survey of community search over big graphs , 2019, The VLDB Journal.

[43]  Jia Wang,et al.  Rectangle Counting in Large Bipartite Graphs , 2014, 2014 IEEE International Congress on Big Data.

[44]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[45]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[46]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[47]  Vachik S. Dave,et al.  Triangle counting in large networks: a review , 2018, WIREs Data Mining Knowl. Discov..

[48]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[49]  Huy T. Vo,et al.  The More the Merrier: Efficient Multi-Source Graph Traversal , 2014, Proc. VLDB Endow..

[50]  Garry Robins,et al.  Small Worlds Among Interlocking Directors: Network Structure and Distance in Bipartite Graphs , 2004, Comput. Math. Organ. Theory.

[51]  Maria E. Orlowska,et al.  On data allocation with minimum overall communication costs in distributed database design , 1993, Proceedings of ICCI'93: 5th International Conference on Computing and Information.

[52]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[53]  Matthieu Latapy,et al.  Basic notions for the analysis of large two-mode networks , 2008, Soc. Networks.

[54]  Xuemin Lin,et al.  Speedup Graph Processing by Graph Ordering , 2016, SIGMOD Conference.

[55]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..

[56]  Guy E. Blelloch,et al.  Compact representations of separable graphs , 2003, SODA '03.

[57]  Christos Faloutsos,et al.  Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[58]  Tore Opsahl Triadic closure in two-mode networks: Redefining the global and local clustering coefficients , 2013, Soc. Networks.

[59]  Seshadhri Comandur,et al.  A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem , 2016, WWW.

[60]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[61]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[62]  Martin G. Everett,et al.  Network analysis of 2-mode data , 1997 .

[63]  Giuseppe Ottaviano,et al.  Compressing Graphs and Indexes with Recursive Graph Bisection , 2016, KDD.

[64]  Viktor K. Prasanna,et al.  Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.

[65]  Shirish Tatikonda,et al.  On efficient posting list intersection with multicore processors , 2009, SIGIR.

[66]  Michael Ornstein,et al.  Interlocking Directorates in Canada: Intercorporate or Class Alliance? , 1984 .

[67]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[68]  Ali Pinar,et al.  Peeling Bipartite Networks for Dense Subgraph Discovery , 2016, WSDM.