Active Semi-Supervised Community Detection Based on Must-Link and Cannot-Link Constraints

Community structure detection is of great importance because it can help in discovering the relationship between the function and the topology structure of a network. Many community detection algorithms have been proposed, but how to incorporate the prior knowledge in the detection process remains a challenging problem. In this paper, we propose a semi-supervised community detection algorithm, which makes full utilization of the must-link and cannot-link constraints to guide the process of community detection and thereby extracts high-quality community structures from networks. To acquire the high-quality must-link and cannot-link constraints, we also propose a semi-supervised component generation algorithm based on active learning, which actively selects nodes with maximum utility for the proposed semi-supervised community detection algorithm step by step, and then generates the must-link and cannot-link constraints by accessing a noiseless oracle. Extensive experiments were carried out, and the experimental results show that the introduction of active learning into the problem of community detection makes a success. Our proposed method can extract high-quality community structures from networks, and significantly outperforms other comparison methods.

[1]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Sidney Redner,et al.  Community structure of the physical review citation network , 2009, J. Informetrics.

[3]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[4]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[5]  Xiaofei He,et al.  Laplacian Regularized D-Optimal Design for Active Learning and Its Application to Image Retrieval , 2010, IEEE Transactions on Image Processing.

[6]  Charlotte M. Deane,et al.  The function of communities in protein interaction networks at multiple scales , 2009, BMC Systems Biology.

[7]  R. Carter 11 – IT and society , 1991 .

[8]  T. Murata,et al.  Advanced modularity-specialized label propagation algorithm for detecting communities in networks , 2009, 0910.1154.

[9]  Michelle Girvan,et al.  Erratum: Spectral properties of networks with community structure [Phys. Rev. E 80, 056114 (2009)] , 2012 .

[10]  Nozha Boujemaa,et al.  Active semi-supervised fuzzy clustering , 2008, Pattern Recognit..

[11]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Azadeh Shakery,et al.  Personalized PageRank Clustering: A graph clustering algorithm based on random walks , 2013 .

[13]  Jian-Guo Liu,et al.  Detecting community structure in complex networks via node similarity , 2010 .

[14]  Qing He,et al.  Effective semi-supervised document clustering via active learning with instance-level constraints , 2011, Knowledge and Information Systems.

[15]  Jon Kleinberg,et al.  The Structure of the Web , 2001, Science.

[16]  Shihua Zhang,et al.  Uncovering fuzzy community structure in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Reinhard Lipowsky,et al.  Network Brownian Motion: A New Method to Measure Vertex-Vertex Proximity and to Identify Communities and Subcommunities , 2004, International Conference on Computational Science.

[18]  Martin Rosvall,et al.  Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems , 2010, PloS one.

[19]  Xue-Qi Cheng,et al.  Uncovering the community structure associated with the diffusion dynamics on networks , 2009, 0911.2308.

[20]  Ian Davidson,et al.  Active Spectral Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[21]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[22]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[23]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[24]  J. Fowler,et al.  Distance Measures for Dynamic Citation Networks , 2009, 0909.1819.

[25]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[27]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[28]  Nitesh V. Chawla,et al.  Identifying and evaluating community structure in complex networks , 2010, Pattern Recognit. Lett..

[29]  M. A. Muñoz,et al.  Journal of Statistical Mechanics: An IOP and SISSA journal Theory and Experiment Detecting network communities: a new systematic and efficient algorithm , 2004 .

[30]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[31]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Lise Getoor,et al.  Active Learning for Networked Data , 2010, ICML.

[33]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[35]  S. Dongen Graph clustering by flow simulation , 2000 .

[36]  Boleslaw K. Szymanski,et al.  Community detection using a neighborhood strength driven Label Propagation Algorithm , 2011, 2011 IEEE Network Science Workshop.

[37]  Jiawei Han,et al.  A Variance Minimization Criterion to Active Learning on Graphs , 2012, AISTATS.

[38]  E. Ott,et al.  Spectral properties of networks with community structure. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[40]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[41]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Kyle Luh,et al.  Community Detection Using Spectral Clustering on Sparse Geosocial Data , 2012, SIAM J. Appl. Math..

[43]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[45]  Huawei Shen,et al.  Covariance, correlation matrix, and the multiscale community structure of networks. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Alex Arenas,et al.  Synchronization reveals topological scales in complex networks. , 2006, Physical review letters.

[47]  Xueqi Cheng,et al.  Spectral methods for the detection of network community structure: a comparative analysis , 2010, ArXiv.

[48]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[49]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[50]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  M. Barber,et al.  Detecting network communities by propagating labels under constraints. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[52]  Yanjun Ma,et al.  Mining User's Real Social Circle in Microblog , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[53]  Z. Di,et al.  Community detection by signaling on complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[54]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[55]  Nicolas Labroche,et al.  Active Learning for Semi-Supervised K-Means Clustering , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[56]  Shihua Zhang,et al.  Identification of overlapping community structure in complex networks using fuzzy c-means clustering , 2007 .