A novel criterion for overlapping communities detection and clustering improvement

In community detection, the theme of correctly identifying overlapping nodes, i.e. nodes which belong to more than one community, is important as it is related to role detection and to the improvement of the quality of clustering: proper detection of overlapping nodes gives a better understanding of the community structure. In this paper, we introduce a novel measure, called cuttability, that we show being useful for reliable detection of overlaps among communities and for improving the quality of the clustering, measured via modularity. The proposed algorithm shows better behaviour than existing techniques on the considered datasets (IRC logs and Enron e-mail log). The best behaviour is caught when a network is split between micro-communities. In that case, the algorithm manages to get a better description of the community structure.

[1]  Paul Dourish,et al.  Social and temporal structures in everyday collaboration , 2004, CHI.

[2]  Rami Puzis,et al.  Organization Mining Using Online Social Networks , 2013, Networks and Spatial Economics.

[3]  I Leyva,et al.  Dynamics of overlapping structures in modular networks. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Daniel J. Brass,et al.  Organizational Social Network Research: Core Ideas and Key Debates , 2010 .

[5]  MengChu Zhou,et al.  A weight-incorporated similarity-based clustering ensemble method , 2014, Proceedings of the 11th IEEE International Conference on Networking, Sensing and Control.

[6]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[7]  Dit-Yan Yeung,et al.  Overlapping community detection via bounded nonnegative matrix tri-factorization , 2012, KDD.

[8]  Wenpin Tsai,et al.  Social networks and organizations , 2003 .

[9]  Michalis Vazirgiannis,et al.  Clustering and Community Detection in Directed Networks: A Survey , 2013, ArXiv.

[10]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[11]  Hajo A. Reijers,et al.  Discovering Social Networks from Event Logs , 2005, Computer Supported Cooperative Work (CSCW).

[12]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[14]  Andrew McCallum,et al.  Extracting social networks and contact information from email and the Web , 2004, CEAS.

[15]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Malik Magdon-Ismail,et al.  Efficient Identification of Overlapping Communities , 2005, ISI.

[18]  Andreas Geyer-Schulz,et al.  An ensemble learning strategy for graph clustering , 2012, Graph Partitioning and Graph Clustering.

[19]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Philippe A. Palanque,et al.  Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , 2014, International Conference on Human Factors in Computing Systems.

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  Qi Kang,et al.  Kernel optimisation for KPCA based on Gaussianity estimation , 2014, Int. J. Bio Inspired Comput..

[23]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[25]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[27]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.