Distributed evolutionary approach to data clustering and modeling

In this article we describe a framework (DEGA-Gen) for the application of distributed genetic algorithms for detection of communities in networks. The framework proposes efficient ways of encoding the network in the chromosomes, greatly optimizing the memory use and computations, resulting in a scalable framework. Different objective functions may be used for producing division of network into communities. The framework is implemented using open source implementation of MapReduce paradigm, Hadoop. We validate the framework by developing community detection algorithm, which uses modularity as measure of the division. Result of the algorithm is the network, partitioned into non-overlapping communities, in such a way, that network modularity is maximized. We apply the algorithm to well-known data sets, such as Zachary Karate club, bottlenose Dolphins network, College football dataset, and US political books dataset. Framework shows comparable results in achieved modularity; however, much less space is used for network representation in memory. Further, the framework is scalable and can deal with large graphs as it was tested on a larger youtube.com dataset.

[1]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  T. Vicsek,et al.  Clique percolation in random networks. , 2005, Physical review letters.

[3]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[4]  Michalis Vazirgiannis,et al.  Clustering and Community Detection in Directed Networks: A Survey , 2013, ArXiv.

[5]  Clara Pizzuti,et al.  Community detection in social networks with genetic algorithms , 2008, GECCO '08.

[6]  Jianwu Li,et al.  Community detection in complex networks using extended compact genetic algorithm , 2012, Soft Computing.

[7]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[8]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Clara Pizzuti,et al.  A Multi-objective Genetic Algorithm for Community Detection in Networks , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[10]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[11]  Zuren Feng,et al.  Community detection using Ant Colony Optimization , 2013, IEEE Congress on Evolutionary Computation.

[12]  Michael Ovelgönne,et al.  Distributed community detection in web-scale networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[13]  P. Hansen,et al.  Column generation algorithms for exact modularity maximization in networks. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Ronghua Shang,et al.  Community detection based on modularity and an improved genetic algorithm , 2013 .

[15]  Matthias Ehrgott,et al.  Multicriteria Optimization , 2005 .

[16]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[17]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) , 2006 .

[18]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[19]  Christian Staudt,et al.  Engineering High-Performance Community Detection Heuristics for Massive Graphs , 2013, 2013 42nd International Conference on Parallel Processing.

[20]  Dipankar Dasgupta,et al.  Political Communities in Russian Portion of Liveournal , 2014, 2014 International Conference on Computational Science and Computational Intelligence.

[21]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[22]  V. K. Jayaraman,et al.  Clustering of Complex Networks and Community Detection Using Group Search Optimization , 2013, ArXiv.

[23]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[25]  Jari Veijalainen,et al.  A modelling framework for social media monitoring , 2013, Int. J. Web Eng. Technol..

[26]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[27]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[28]  Haluk Bingol,et al.  Community Detection in Complex Networks Using Genetic Algorithms , 2006, 0711.0491.

[29]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[30]  Qingfu Zhang,et al.  Community detection in networks by using multiobjective evolutionary algorithm with decomposition , 2012 .

[31]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[32]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[33]  Przemyslaw Kazienko,et al.  Parallel processing of large graphs , 2013, Future Gener. Comput. Syst..

[34]  Qingfu Zhang,et al.  Identification of multi-resolution network structures with multi-objective immune algorithm , 2013, Appl. Soft Comput..

[35]  Clara Pizzuti,et al.  GA-Net: A Genetic Algorithm for Community Detection in Social Networks , 2008, PPSN.