Community Detection with Prior Knowledge

The problem of community detection is a challenging one because of the presence of hubs and noisy links, which tend to create highly imbalanced graph clusters. Often, these resulting clusters are not very intuitive and difficult to interpret. With the growing availability of network information, there is a significant amount of prior knowledge available about the communities in social, communication and several other networks. These community labels may be noisy and very limited, though they do help in community detection. In this paper, we explore the use of such noisy labeled information for finding high quality communities. We will present an adaptive density-based clustering which allows flexible incorporation of prior knowledge in to the community detection process. We use a random walk framework to compute the node densities and the level of supervision regulates the node densities and the quality of resulting density based clusters. Our framework is general enough to produce both overlapping and non-overlapping clusters. We empirically show that even with a tiny amount of supervision, our approach can produce superior communities compared to popular baselines.

[1]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[2]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[3]  Xiang Ji,et al.  Document clustering with prior knowledge , 2006, SIGIR.

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[6]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[7]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[8]  Ian Davidson,et al.  Flexible constrained spectral clustering , 2010, KDD.

[9]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[10]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[14]  Philip S. Yu,et al.  Towards Community Detection in Locally Heterogeneous Networks , 2011, SDM.

[15]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.