Community detection in graphs based on surprise maximization using firefly heuristics

The detection of node clusters (communities) in graphs has been at the core of many modeling paradigms emerging in different fields and disciplines such as Social Sciences, Biology, Chemistry, Telecommunications and Linguistics. When evaluating the quality of a clustering arrangement unsupervised metrics can be utilized (e.g. modularity), which all rely on structural and topological characteristics of the cluster space rather than on an observed ground of truth that should be achieved. One of such metrics is the recently published Surprise, which evaluates how statistically unlikely a given clustering arrangement is with respect to a random network featuring the same distribution of nodes per cluster. To maximize this metric, a number of algorithms have been proposed in the literature, but their comparative performance varies significantly between networks of different shape and size. In this article a novel heuristic community detection approach is proposed as a means to achieve a universally well-performing tool for graph clustering based on Surprise maximization. The heuristic scheme relies on the search procedure of the so-called Firefly Algorithm, a nature-inspired meta-heuristic solver based on the collective behavior, mutual attractiveness and random yet controlled movement of these insects. The proposed technique emulates these observed behavioral patterns of fireflies in the genotype of the graph clustering problem rather than on an encoded representation of its search space (phenotype). Simulation results evince that the performance of our community detection scheme generalizes better than other schemes when applied over synthetically generated graphs with varying properties.

[1]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[2]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[3]  Ulrik Brandes,et al.  Experiments on Graph Clustering Algorithms , 2003, ESA.

[4]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[5]  Francesc Comellas,et al.  Graph Coloring Algorithms for Assignment Problems in Radio Networks , 1995 .

[6]  Ignacio Marín,et al.  Deciphering Network Community Structure by Surprise , 2011, PloS one.

[7]  David K. Smith,et al.  The Dandelion Code: A New Coding of Spanning Trees for Genetic Algorithms , 2007, IEEE Transactions on Evolutionary Computation.

[8]  Santosh S. Vempala,et al.  On clusterings: Good, bad and spectral , 2004, JACM.

[9]  Zbigniew Skolicki,et al.  An analysis of island models in evolutionary computation , 2005, GECCO '05.

[10]  V A Traag,et al.  Narrow scope for resolution-limit-free community detection. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Dorothea Wagner,et al.  Graph Clustering with Surprise: Complexity and Exact Solutions , 2013, SOFSEM.

[12]  Lei Shi,et al.  VEGAS: Visual influEnce GrAph Summarization on Citation Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[13]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[14]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[15]  Xin-She Yang,et al.  Firefly Algorithms for Multimodal Optimization , 2009, SAGA.

[16]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[17]  Ajith Abraham,et al.  Swarm Intelligence Algorithms for Data Clustering , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[18]  Javier Del Ser,et al.  A new grouping genetic algorithm for clustering problems , 2012, Expert Syst. Appl..

[19]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[20]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[21]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Beom Jun Kim,et al.  Growing scale-free networks with tunable clustering. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[24]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[25]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[26]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Nidal Nasser,et al.  Clustering in Wireless Sensor Networks: A Graph Theory Perspective , 2008, Algorithms and Protocols for Wireless Sensor Networks.

[29]  Ignacio Marín,et al.  SurpriseMe: an integrated tool for network community structure characterization using Surprise maximization , 2013, Bioinform..

[30]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[31]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[32]  Yannis Manolopoulos,et al.  Node Clustering in Wireless Sensor Networks by Considering Structural Characteristics of the Network Graph , 2007, Fourth International Conference on Information Technology (ITNG'07).

[33]  P. Ronhovde,et al.  Local resolution-limit-free Potts model for community detection. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Ignacio Marín,et al.  Jerarca: Efficient Analysis of Complex Networks Using Hierarchical Clustering , 2010, PloS one.

[35]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[36]  Emin Erkan Korkmaz,et al.  A Two-Level Clustering Method Using Linear Linkage Encoding , 2006, PPSN.