论文信息 - A time-efficient connected densest subgraph discovery algorithm for big data

A time-efficient connected densest subgraph discovery algorithm for big data

In this paper, we propose a time-efficient and exact algorithm for the problem of discovering the densest subgraph in big data. Current algorithms for solving this problem have three problems: i) they cannot handle the dilemma between the efficiency of handing big data and the precision of the discovered densest subgraph; ii) they cannot take advantage of both the parallel computing on MapReduce and in-memory computing on one computer; iii) their applicability to different kinds of graphs has not been discussed. Our proposed algorithm combines the MapReduce parallel computing with in-memory computing on one computer together to improve the efficiency and precision of discovering the densest subgraphs. The algorithm consists of two computational phases: i) the graph reduction in the MapReduce framework; ii) the densest subgraph discovery in memory. Further, we theoretically analyze the correctness of this algorithm and its applicability in different natural graphs. We conduct extensive experimental evaluations in a MapReduce framework on both massive real-world graphs and simulated graphs to test our algorithm in comparison with other algorithms. Experimental results show that our algorithm is more time-efficient and precise than other algorithms.

Haiying Shen | Bo Wu | Haiying Shen | Bo Wu

[1] Bin Wu,et al. A New Algorithm for Enumerating All Maximal Cliques in Complex Network , 2006, ADMA.

[2] Victor S. Adamchik,et al. Some series of the zeta and related functions , 1998 .

[3] Jure Leskovec,et al. Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[4] Ravi Kumar,et al. Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[5] Moses Charikar,et al. Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[6] Jure Leskovec,et al. Signed networks in social media , 2010, CHI.

[7] Yiming Yang,et al. Introducing the Enron Corpus , 2004, CEAS.

[8] J. Leskovec,et al. Laws of Graph Evolution: Densification and Shrinking Diameters , 2006 .

[9] Christos Faloutsos,et al. DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[10] Mark E. J. Newman,et al. The Structure and Function of Complex Networks , 2003, SIAM Rev..

[11] Samir Khuller,et al. On Finding Dense Subgraphs , 2009, ICALP.

[12] Haiying Shen,et al. Discovering the Densest Subgraph in MapReduce for Assortative Big Natural Graphs , 2015, 2015 24th International Conference on Computer Communication and Networks (ICCCN).

[13] Sergei Vassilvitskii,et al. Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[14] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15] Claudio Castellano,et al. Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16] Ravi Kumar,et al. Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[17] Duncan J. Watts,et al. Collective dynamics of ‘small-world’ networks , 1998, Nature.

[18] Albert,et al. Emergence of scaling in random networks , 1999, Science.

[19] Jure Leskovec,et al. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[20] M E J Newman,et al. Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21] Andrew V. Goldberg,et al. Finding a Maximum Density Subgraph , 1984 .

[22] Yousef Saad,et al. Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[23] Andrew V. Goldberg,et al. On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.