Index-Based Densest Clique Percolation Community Search in Networks

Community search is important in graph analysis and can be used in many real applications. In the literature, various community models have been proposed. However, most of them cannot well identify the overlaps between communities which is an essential feature of real graphs. To address this issue, the <inline-formula> <tex-math notation="LaTeX">$k$</tex-math><alternatives><inline-graphic xlink:href="yuan-ieq1-2783933.gif"/> </alternatives></inline-formula>-clique percolation community model was proposed and has been proven effective in many applications. Motivated by this, in this paper, we adopt the <inline-formula><tex-math notation="LaTeX">$k$</tex-math> <alternatives><inline-graphic xlink:href="yuan-ieq2-2783933.gif"/></alternatives></inline-formula>-clique percolation community model and study the densest clique percolation community search problem which aims to find the <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:href="yuan-ieq3-2783933.gif"/></alternatives></inline-formula>-clique percolation community with the maximum <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:href="yuan-ieq4-2783933.gif"/></alternatives></inline-formula> value that contains a given set of query nodes. We adopt an index-based approach to solve this problem. Based on the observation that a <inline-formula> <tex-math notation="LaTeX">$k$</tex-math><alternatives><inline-graphic xlink:href="yuan-ieq5-2783933.gif"/> </alternatives></inline-formula>-clique percolation community is a union of maximal cliques, we devise a novel compact index, <inline-formula><tex-math notation="LaTeX">$\mathsf {DCPC}$</tex-math><alternatives> <inline-graphic xlink:href="yuan-ieq6-2783933.gif"/></alternatives></inline-formula>-<inline-formula> <tex-math notation="LaTeX">$\mathsf {Index}$</tex-math><alternatives> <inline-graphic xlink:href="yuan-ieq7-2783933.gif"/></alternatives></inline-formula>, to preserve the maximal cliques and their connectivity information of the input graph. With <inline-formula><tex-math notation="LaTeX">$\mathsf {DCPC}$</tex-math><alternatives><inline-graphic xlink:href="yuan-ieq8-2783933.gif"/></alternatives></inline-formula>- <inline-formula><tex-math notation="LaTeX">$\mathsf {Index}$</tex-math><alternatives> <inline-graphic xlink:href="yuan-ieq9-2783933.gif"/></alternatives></inline-formula>, we can answer the densest clique percolation community query efficiently. Besides, we also propose an index construction algorithm based on the definition of <inline-formula><tex-math notation="LaTeX">$\mathsf {DCPC}$</tex-math><alternatives> <inline-graphic xlink:href="yuan-ieq10-2783933.gif"/></alternatives></inline-formula>-<inline-formula> <tex-math notation="LaTeX">$\mathsf {Index}$</tex-math><alternatives> <inline-graphic xlink:href="yuan-ieq11-2783933.gif"/></alternatives></inline-formula> and further improve the algorithm in terms of efficiency and memory consumption. We conduct extensive performance studies on real graphs and the experimental results demonstrate the efficiency of our index-based query processing algorithm and index construction algorithm.

[1]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[2]  Jeffrey Xu Yu,et al.  Finding maximal cliques in massive networks , 2011, TODS.

[3]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[4]  Kimmo Kaski,et al.  Spectral and network methods in the analysis of correlation matrices of stock returns , 2007 .

[5]  Robert E. Tarjan,et al.  Efficiency of a Good But Not Linear Set Union Algorithm , 1972, JACM.

[6]  Xiang-Sun Zhang,et al.  Modularity optimization in community detection of complex networks , 2009 .

[7]  Laks V. S. Lakshmanan,et al.  Approximate Closest Community Search in Networks , 2015, Proc. VLDB Endow..

[8]  Paul A. Bates,et al.  Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis , 2006, BMC Bioinformatics.

[9]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[10]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[11]  Malik Magdon-Ismail,et al.  Defining and Discovering Communities in Social Networks , 2012 .

[12]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[14]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[15]  Lijun Chang,et al.  I/O efficient ECC graph decomposition via graph reduction , 2016, The VLDB Journal.

[16]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[17]  Enrico Gregori,et al.  Parallel $(k)$-Clique Community Detection on Large-Scale Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[18]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Luciano Rossoni,et al.  Models and methods in social network analysis , 2006 .

[20]  Barry Smyth,et al.  A Community-Based Approach to Personalizing Web Search , 2007, Computer.

[21]  Anthony K. H. Tung,et al.  Large Scale Cohesive Subgraphs Discovery for Social Network Visual Analysis , 2012, Proc. VLDB Endow..

[22]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[23]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[24]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[25]  Haixun Wang,et al.  Local search of communities in large graphs , 2014, SIGMOD Conference.

[26]  Jie Liu,et al.  QUERY ROUTING IN A PEER‐TO‐PEER SEMANTIC LINK NETWORK , 2005, Comput. Intell..

[27]  Osmar R. Zaïane,et al.  Top Leaders Community Detection Approach in Information Networks , 2010 .

[28]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[29]  J. Kumpula,et al.  Sequential algorithm for fast clique percolation. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[31]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Haixun Wang,et al.  Online search of overlapping communities , 2013, SIGMOD '13.

[33]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[34]  Paul A. Bates,et al.  Global topological features of cancer proteins in the human interactome , 2006, Bioinform..

[35]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[36]  T. Vicsek,et al.  Community structure and ethnic preferences in school friendship networks , 2006, physics/0611268.

[37]  Anthony K. H. Tung,et al.  On Triangulation-based Dense Neighborhood Graphs Discovery , 2010, Proc. VLDB Endow..