A time-efficient connected densest subgraph discovery algorithm for big data

In this paper, we propose a time-efficient and exact algorithm for the problem of discovering the densest subgraph in big data. Current algorithms for solving this problem have three problems: i) they cannot handle the dilemma between the efficiency of handing big data and the precision of the discovered densest subgraph; ii) they cannot take advantage of both the parallel computing on MapReduce and in-memory computing on one computer; iii) their applicability to different kinds of graphs has not been discussed. Our proposed algorithm combines the MapReduce parallel computing with in-memory computing on one computer together to improve the efficiency and precision of discovering the densest subgraphs. The algorithm consists of two computational phases: i) the graph reduction in the MapReduce framework; ii) the densest subgraph discovery in memory. Further, we theoretically analyze the correctness of this algorithm and its applicability in different natural graphs. We conduct extensive experimental evaluations in a MapReduce framework on both massive real-world graphs and simulated graphs to test our algorithm in comparison with other algorithms. Experimental results show that our algorithm is more time-efficient and precise than other algorithms.

[1]  Bin Wu,et al.  A New Algorithm for Enumerating All Maximal Cliques in Complex Network , 2006, ADMA.

[2]  Victor S. Adamchik,et al.  Some series of the zeta and related functions , 1998 .

[3]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[4]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[5]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[6]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[7]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[8]  J. Leskovec,et al.  Laws of Graph Evolution: Densification and Shrinking Diameters , 2006 .

[9]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[10]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[11]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[12]  Haiying Shen,et al.  Discovering the Densest Subgraph in MapReduce for Assortative Big Natural Graphs , 2015, 2015 24th International Conference on Computer Communication and Networks (ICCCN).

[13]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[17]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[18]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[19]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[20]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[22]  Yousef Saad,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[23]  Andrew V. Goldberg,et al.  On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.