Robust Local Community Detection: On Free Rider Effect and Its Elimination

Given a large network, local community detection aims at finding the community that contains a set of query nodes and also maximizes (minimizes) a goodness metric. This problem has recently drawn intense research interest. Various goodness metrics have been proposed. However, most existing metrics tend to include irrelevant subgraphs in the detected local community. We refer to such irrelevant subgraphs as free riders. We systematically study the existing goodness metrics and provide theoretical explanations on why they may cause the free rider effect. We further develop a query biased node weighting scheme to reduce the free rider effect. In particular, each node is weighted by its proximity to the query node. We define a query biased density metric to integrate the edge and node weights. The query biased densest subgraph, which has the largest query biased density, will shift to the neighborhood of the query nodes after node weighting. We then formulate the query biased densest connected subgraph (QDC) problem, study its complexity, and provide efficient algorithms to solve it. We perform extensive experiments on a variety of real and synthetic networks to evaluate the effectiveness and efficiency of the proposed methods.

[1]  Michal Laclavik,et al.  On community detection in real-world networks and the importance of degree assortativity , 2013, KDD.

[2]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[3]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[4]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[5]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[6]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[7]  Ruoming Jin,et al.  Fast and unified local search for random walk based k-nearest-neighbor query in large graphs , 2014, SIGMOD Conference.

[8]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[9]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[10]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[11]  Kevin J. Lang,et al.  Finding dense and isolated submarkets in a sponsored search spending graph , 2007, CIKM '07.

[12]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[14]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[16]  Jeffrey Xu Yu,et al.  Querying k-truss community in large and dynamic graphs , 2014, SIGMOD Conference.

[17]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[18]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Haixun Wang,et al.  Local search of communities in large graphs , 2014, SIGMOD Conference.

[20]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[21]  Haixun Wang,et al.  Online search of overlapping communities , 2013, SIGMOD '13.

[22]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[23]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[24]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[25]  M. Hasler,et al.  Network community-detection enhancement by proper weighting. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Feng Luo,et al.  Exploring Local Community Structures in Large Networks , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[27]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[28]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[29]  Hao Huang,et al.  GMAC: A Seed-Insensitive Approach to Local Community Detection , 2013, DaWaK.

[30]  Samir Khuller,et al.  Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs , 2010, RECOMB.

[31]  Chak-Kuen Wong,et al.  A faster approximation algorithm for the Steiner problem in graphs , 1986, Acta Informatica.

[32]  Jon M. Kleinberg,et al.  Community membership identification from small seed sets , 2014, KDD.

[33]  Nisheeth K. Vishnoi,et al.  A local spectral method for graphs: with applications to improving graph partitions and exploring data graphs locally , 2009, J. Mach. Learn. Res..

[34]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[35]  A. Clauset Finding local community structure in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Robert E. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[37]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.