Compact group discovery in attributed graphs and social networks

Abstract Social networks and many other graphs are attributed, meaning that their nodes are labelled with textual information such as personal data, expertise or interests. In attributed graphs, a common data analysis task is to find subgraphs whose nodes contain a given set of keywords. In many applications, the size of the subgraph should be limited (i.e., a subgraph with thousands of nodes is not desired). In this work, we introduce the problem of compact attributed group (AG) discovery. Given a set of query keywords and a desired solution size, the task is to find subgraphs with the desired number of nodes, such that the nodes are closely connected and each node contains as many query keywords as possible. We prove that finding an optimal solution is NP-hard and we propose approximation algorithms with a guaranteed ratio of two. Since the number of qualifying AGs may be large, we also show how to find approximate top-k AGs with polynomial delay. Finally, we experimentally verify the effectiveness and efficiency of our techniques on real-world graphs.

[1]  Fanghua Ye,et al.  Skyline Community Search in Multi-valued Networks , 2018, SIGMOD Conference.

[2]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[4]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[5]  Yinghui Wu,et al.  Summarizing Answer Graphs Induced by Keyword Queries , 2013, Proc. VLDB Endow..

[6]  Wenjun Zhao,et al.  Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment , 2019, Inf. Process. Manag..

[7]  Yiannis Kompatsiaris,et al.  Image clustering through community detection on hybrid image similarity graphs , 2010, 2010 IEEE International Conference on Image Processing.

[8]  Hamid Beigy,et al.  On dynamicity of expert finding in community question answering , 2017, Inf. Process. Manag..

[9]  Laks V. S. Lakshmanan,et al.  Attribute-Driven Community Search , 2016, Proc. VLDB Endow..

[10]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[11]  Lukasz Golab,et al.  Authority-based Team Discovery in Social Networks , 2016, EDBT.

[12]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[13]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14]  Jasbir S. Arora,et al.  Survey of multi-objective optimization methods for engineering , 2004 .

[15]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Xiaodong Li,et al.  Effective Community Search over Large Spatial Graphs , 2017, Proc. VLDB Endow..

[17]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[18]  Hong Shen,et al.  Dissimilarity-constrained node attribute coverage diversification for novelty-enhanced top-k search in large attributed networks , 2018, Knowl. Based Syst..

[19]  E. Lawler A PROCEDURE FOR COMPUTING THE K BEST SOLUTIONS TO DISCRETE OPTIMIZATION PROBLEMS AND ITS APPLICATION TO THE SHORTEST PATH PROBLEM , 1972 .

[20]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[21]  Haixun Wang,et al.  Local search of communities in large graphs , 2014, SIGMOD Conference.

[22]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[23]  Lukasz Golab,et al.  eGraphSearch: Effective Keyword Search in Graphs , 2016, CIKM.

[24]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Hadi Zare,et al.  IEDC: An integrated approach for overlapping and non-overlapping community detection , 2016, Knowl. Based Syst..

[26]  Kai Wang,et al.  Efficient Computing of Radius-Bounded k-Cores , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[27]  Yike Guo,et al.  A novel community detection algorithm based on simplification of complex networks , 2017, Knowl. Based Syst..

[28]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[29]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[30]  Mihalis Yannakakis,et al.  On Generating All Maximal Independent Sets , 1988, Inf. Process. Lett..

[31]  Jeffrey Xu Yu,et al.  Influential Community Search in Large Networks , 2015, Proc. VLDB Endow..

[32]  Ramez Elmasri,et al.  Querying Knowledge Graphs by Example Entity Tuples , 2015, IEEE Trans. Knowl. Data Eng..

[33]  Reynold Cheng,et al.  Effective Community Search for Large Attributed Graphs , 2016, Proc. VLDB Endow..

[34]  Xiaohui Yu,et al.  Meaningful keyword search in relational databases with large and complex schema , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[35]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[36]  Xiaohui Yu,et al.  Efficient Duplication Free and Minimal Keyword Search in Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[37]  Aijun An,et al.  Keyword Search in Graphs: Finding r-cliques , 2011, Proc. VLDB Endow..

[38]  R. Marler,et al.  The weighted sum method for multi-objective optimization: new insights , 2010 .