Distributed algorithms for k-truss decomposition

k-truss, a type of cohesive subgraphs of a network, is an important measure for a social network graph. However, with the emergence of large online social networks, the running time of the traditional batch algorithms for k-truss decomposition is usually prohibitively long on such a graph with billions of edges and millions of vertices. Moreover, the size of a graph becomes too large to load into the main memory of a single machine. Currently, cloud computing has become an imperative way to process the big data. Thus, our aim is to design a scalable algorithm of k-truss decomposition in the scenario of cloud computing. In this paper, we first improve the existing distributed k-truss decomposition in the MapReduce framework. We then propose a theoretical basis for k-truss and use it to design an algorithm based on graph-parallel abstractions. Our experiment results show that our method in the graph-parallel abstraction significantly outperforms the methods based on MapReduce in terms of running time and disk usage.

[1]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[2]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3]  Jeffrey Xu Yu,et al.  Efficient Core Maintenance in Large Dynamic Graphs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  James Cheng,et al.  Fast algorithms for maximal clique enumeration with limited memory , 2012, KDD.

[6]  Wei Chen,et al.  Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model , 2011, SDM.

[7]  Christos Faloutsos,et al.  Scalable modeling of real graphs using Kronecker multiplication , 2007, ICML '07.

[8]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[9]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[10]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[11]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[12]  Francesco De Pellegrini,et al.  Distributed k-Core Decomposition , 2013 .

[13]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[14]  Jin-Soo Kim,et al.  HAMA: An Efficient Matrix Computation with the MapReduce Framework , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[15]  Chih-Ya Shen,et al.  On socio-spatial group query for location-based social networks , 2012, KDD.

[16]  Ming-Syan Chen,et al.  On Social-Temporal Group Query with Acquaintance Constraint , 2011, Proc. VLDB Endow..

[17]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[18]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[19]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[20]  Xin Wang,et al.  Performance Guarantees for Distributed Reachability Queries , 2012, Proc. VLDB Endow..

[21]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.