A Stochastic Model for Detecting Overlapping and Hierarchical Community Structure

Community detection is a fundamental problem in the analysis of complex networks. Recently, many researchers have concentrated on the detection of overlapping communities, where a vertex may belong to more than one community. However, most current methods require the number (or the size) of the communities as a priori information, which is usually unavailable in real-world networks. Thus, a practical algorithm should not only find the overlapping community structure, but also automatically determine the number of communities. Furthermore, it is preferable if this method is able to reveal the hierarchical structure of networks as well. In this work, we firstly propose a generative model that employs a nonnegative matrix factorization (NMF) formulization with a l2,1 norm regularization term, balanced by a resolution parameter. The NMF has the nature that provides overlapping community structure by assigning soft membership variables to each vertex; the l2,1 regularization term is a technique of group sparsity which can automatically determine the number of communities by penalizing too many nonempty communities; and hence the resolution parameter enables us to explore the hierarchical structure of networks. Thereafter, we derive the multiplicative update rule to learn the model parameters, and offer the proof of its correctness. Finally, we test our approach on a variety of synthetic and real-world networks, and compare it with some state-of-the-art algorithms. The results validate the superior performance of our new method.

[1]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[2]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[3]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[4]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Renaud Lambiotte,et al.  Multi-scale modularity in complex networks , 2010, 8th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks.

[6]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[7]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[8]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[9]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[10]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[11]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  R. Lambiotte,et al.  Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks , 2008, IEEE Transactions on Network Science and Engineering.

[13]  M. Newman,et al.  Identifying the role that animals play in their social networks , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[14]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[16]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Fei Wang,et al.  Community discovery using nonnegative matrix factorization , 2011, Data Mining and Knowledge Discovery.

[18]  Jianbin Huang,et al.  Towards Online Multiresolution Community Detection in Large-Scale Networks , 2011, PloS one.

[19]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[20]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Renato D. C. Monteiro,et al.  Group Sparsity in Nonnegative Matrix Factorization , 2012, SDM.

[22]  K. A. Samani,et al.  Detecting overlapping community structure of networks based on vertex–vertex correlations , 2009 .

[23]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Yong Wang,et al.  Overlapping Community Detection in Complex Networks using Symmetric Binary Matrix Factorization , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[26]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Martin Rosvall,et al.  Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems , 2010, PloS one.

[28]  Jean-Charles Delvenne,et al.  Stability of graph communities across time scales , 2008, Proceedings of the National Academy of Sciences.

[29]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Yixin Cao,et al.  Identifying overlapping communities as well as hubs and outliers via nonnegative matrix factorization , 2013, Scientific Reports.

[31]  M. Newman Communities, modules and large-scale structure in networks , 2011, Nature Physics.

[32]  Donald E. Knuth,et al.  The Stanford GraphBase - a platform for combinatorial computing , 1993 .

[33]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[34]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[35]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[36]  Vincent Y. F. Tan,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Wei Ren,et al.  Simple probabilistic algorithm for detecting community structure. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Seungjin Choi,et al.  Group Nonnegative Matrix Factorization for EEG Classification , 2009, AISTATS.

[39]  Dit-Yan Yeung,et al.  Overlapping community detection via bounded nonnegative matrix tri-factorization , 2012, KDD.

[40]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Martin Rosvall,et al.  Compression of flow can reveal overlapping modular organization in networks , 2011, ArXiv.

[42]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[43]  Di Jin,et al.  Extending a configuration model to find communities in complex networks , 2013 .

[44]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Xueqi Cheng,et al.  Exploring the structural regularities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.