Community structure mining in big data social media networks with MapReduce

Social media networks are playing increasingly prominent role in people’s daily life. Community structure is one of the salient features of social media network and has been applied to practical applications, such as recommendation system and network marketing. With the rapid expansion of social media size and surge of tremendous amount of information, how to identify the communities in big data scenarios has become a challenge. Based on our previous work and the map equation (an equation from information theory for community mining), we develop a novel distributed community structure mining framework. In the framework, (1) we propose a new link information update method to try to avoid data writing related operations and try to speedup the process. (2) We use the local information from the nodes and their neighbors, instead of the pagerank, to calculate the probability distribution of the nodes. (3) We exclude the network partitioning process from our previous work and try to run the map equation directly on MapReduce. Empirical results on real-world social media networks and artificial networks show that the new framework outperforms our previous work and some well-known algorithms, such as Radetal, FastGN, in accuracy, velocity and scalability.

[1]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[2]  Richard Clark Pasco,et al.  Source coding algorithms for fast data compression , 1976 .

[3]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[4]  Li Aiping,et al.  A MapReduce and Information Compression Based Social Community Structure Mining Method , 2013, CSE.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Shunsuke Ihara,et al.  Information theory - for continuous systems , 1993 .

[7]  Hocine Cherifi,et al.  Comparative evaluation of community detection algorithms: a topological approach , 2012, ArXiv.

[8]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[9]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[12]  Christian Staudt,et al.  Engineering Parallel Algorithms for Community Detection in Massive Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[13]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[14]  David A. Bader,et al.  Parallel Community Detection for Massive Graphs , 2011, PPAM.

[15]  Michel Crampes,et al.  Survey on Social Community Detection , 2013, Social Media Retrieval.

[16]  Erik Cambria,et al.  Big Social Data Analysis , 2013 .

[17]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[18]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[19]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[20]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[21]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[23]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Dayou Liu,et al.  Discovering Communities from Social Networks: Methodologies and Applications , 2010, Handbook of Social Network Technologies.

[25]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[26]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Philip S. Yu,et al.  A Parallel Community Structure Mining Method in Big Social Networks , 2015 .

[28]  Carl T. Bergstrom,et al.  The map equation , 2009, 0906.1405.

[29]  Martin Rosvall,et al.  Memory in network flows and its effects on spreading dynamics and community detection , 2013, Nature Communications.

[30]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[31]  Ananth Kalyanaraman,et al.  An efficient MapReduce algorighm for parallelizing large-scale graph clustering , 2012 .