Efficient and Effective Community Search on Large-scale Bipartite Graphs

Bipartite graphs are widely used to model relationships between two types of entities. Community search retrieves densely connected subgraphs containing a query vertex, which has been extensively studied on unipartite graphs. However, community search on bipartite graphs remains largely unexplored. Moreover, all existing cohesive subgraph models on bipartite graphs can only be applied to measure the structure cohesiveness between two sets of vertices while overlooking the edge weight in forming the community. In this paper, we study the significant (alpha, beta)-community search problem on weighted bipartite graphs. Given a query vertex q, we aim to find the significant (alpha, beta)-community R of q which adopts (alpha, beta)-core to characterize the engagement level of vertices, and maximizes the minimum edge weight (significance) within R. To support fast retrieval of R, we first retrieve the maximal connected subgraph of (alpha, beta)-core containing the query vertex (the (alpha, beta)-community), and the search space is limited to this subgraph with a much smaller size than the original graph. A novel index structure is presented which can be built in O(delta * m) time and takes O(delta * m) space where m is the number of edges in G, delta is bounded by the square root of m and is much smaller in practice. Utilizing the index, the (alpha, beta)-community can be retrieved in optimal time. To further obtain R, we develop peeling and expansion algorithms to conduct searches by shrinking from the (alpha, beta)-community and expanding from the query vertex, respectively. The experimental results on real graphs not only demonstrate the effectiveness of the significant (alpha, beta)-community model but also validate the efficiency of our query processing and indexing techniques.

[1]  Lu Qin,et al.  Efficient Bitruss Decomposition for Large-scale Bipartite Graphs , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[2]  Chen Zhang,et al.  CoreCube: Core Decomposition in Multilayer Graphs , 2019, WISE.

[3]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[4]  Benjamin Moseley,et al.  Fair Hierarchical Clustering , 2020, NeurIPS.

[5]  Ying Zhang,et al.  Finding Critical Users in Social Communities: The Collapsed Core and Truss Problems , 2020, IEEE Transactions on Knowledge and Data Engineering.

[6]  Fan Zhang,et al.  Finding Critical Users for Social Network Engagement: The Collapsed k-Core Problem , 2017, AAAI.

[7]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Hui Li,et al.  Efficient Fault-Tolerant Group Recommendation Using alpha-beta-core , 2017, CIKM.

[9]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[10]  Fan Zhang,et al.  Efficient Graph Hierarchical Decomposition with User Engagement and Tie Strength , 2020, DASFAA.

[11]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[12]  Lijun Chang,et al.  Index-Based Densest Clique Percolation Community Search in Networks , 2018, IEEE Transactions on Knowledge and Data Engineering.

[13]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[14]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  Lu Qin,et al.  StructSim: Querying Structural Node Similarity at Billion Scale , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[16]  Jeffrey Xu Yu,et al.  Querying k-truss community in large and dynamic graphs , 2014, SIGMOD Conference.

[17]  Lu Qin,et al.  Efficient (α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}, β\documentclass[12pt]{minimal} \u , 2020, The VLDB Journal.

[18]  Sara Ahmadian,et al.  Fair Correlation Clustering , 2020, AISTATS.

[19]  Peixiang Zhao,et al.  Truss-based Community Search: a Truss-equivalence Based Indexing Approach , 2017, Proc. VLDB Endow..

[20]  Frank Schweitzer,et al.  A k-shell decomposition method for weighted networks , 2012, ArXiv.

[21]  Haixun Wang,et al.  Local search of communities in large graphs , 2014, SIGMOD Conference.

[22]  Xiaodong Li,et al.  Effective Community Search over Large Spatial Graphs , 2017, Proc. VLDB Endow..

[23]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[24]  Xuemin Lin,et al.  Efficient Community Search with Size Constraint , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[25]  Andrew McCallum,et al.  Paper Matching with Local Fairness Constraints , 2019, KDD.

[26]  Zhaonian Zou,et al.  Bitruss Decomposition of Bipartite Graphs , 2016, DASFAA.

[27]  Kai Wang,et al.  Efficient Computing of Radius-Bounded k-Cores , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[28]  E. Almaas,et al.  s-core network decomposition: a generalization of k-core analysis to weighted networks. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Fan Zhang,et al.  Exploring Finer Granularity within the Cores: Efficient (k,p)-Core Computation , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[30]  Jun Wang,et al.  Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[31]  Nicola Barbieri,et al.  Efficient and effective community search , 2015, Data Mining and Knowledge Discovery.

[32]  Xiaodong Li,et al.  On Spatial-Aware Community Search , 2019, IEEE Transactions on Knowledge and Data Engineering.

[33]  Laks V. S. Lakshmanan,et al.  Attribute-Driven Community Search , 2016, Proc. VLDB Endow..

[34]  Ying Zhang,et al.  A survey of community search over big graphs , 2019, The VLDB Journal.

[35]  Xuemin Lin,et al.  Efficient (α, β)-core Computation: an Index-based Approach , 2019, WWW.

[36]  Yun Zhang,et al.  On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types , 2013, BMC Bioinformatics.

[37]  Reynold Cheng,et al.  Efficient Algorithms for Densest Subgraph Discovery , 2019, Proc. VLDB Endow..

[38]  Ying Zhang,et al.  Exploring Cohesive Subgraphs with Vertex Engagement and Tie Strength in Bipartite Graphs , 2020, Inf. Sci..

[39]  Kai Wang,et al.  Vertex Priority Based Butterfly Counting for Large-scale Bipartite Networks , 2018, Proc. VLDB Endow..

[40]  Reynold Cheng,et al.  Effective Community Search for Large Attributed Graphs , 2016, Proc. VLDB Endow..

[41]  Ali Pinar,et al.  Peeling Bipartite Networks for Dense Subgraph Discovery , 2016, WSDM.

[42]  ZhengZibin,et al.  Finding weighted k-truss communities in large networks , 2017 .

[43]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..

[44]  Laks V. S. Lakshmanan,et al.  Approximate Closest Community Search in Networks , 2015, Proc. VLDB Endow..

[45]  Hongzhi Wang,et al.  Effective and Efficient Community Search Over Large Directed Graphs , 2019, IEEE Transactions on Knowledge and Data Engineering.

[46]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[47]  Fan Zhang,et al.  OLAK: An Efficient Algorithm to Prevent Unraveling in Social Networks , 2017, Proc. VLDB Endow..