k-core Decomposition on Giraph and GraphChi

The analysis of characteristics of large-scale graphs has shown tremendous benefits in social networks, spam detection, epidemic disease control, analyzing software systems and so on. However, today, processing graph algorithms on massive datasets is not an easy task not only because of the large data volume, but also the complexity of the graph algorithm. Therefore, a number of large-scale processing platforms have been developed to tackle these problems. GraphChi is a popular system that is capable of executing massive graph datasets on a single PC. Some researchers claim that GraphChi has the same or even better performance, compared with distributed graph-analytics platforms such as the popular Apache Giraph. In this paper, we implement a well-optimized k-core decomposition algorithm on Giraph. Then we provide a comparison of the performance of running the k-core decomposition algorithm in Giraph and GraphChi using various graph datasets.

[1]  Claudio Martella,et al.  Practical Graph Analytics with Apache Giraph , 2015, Apress.

[2]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3]  Alex Thomo,et al.  Fault-tolerant computation of distributed regular path queries , 2009, Theor. Comput. Sci..

[4]  Haixun Wang,et al.  Local search of communities in large graphs , 2014, SIGMOD Conference.

[5]  Hausi A. Müller,et al.  SmarterDeals: a context-aware deal recommendation system based on the smartercontext engine , 2012, CASCON.

[6]  Alex Thomo,et al.  Algebraic rewritings for optimizing regular path queries , 2001, Theor. Comput. Sci..

[7]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[8]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[9]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[10]  Kazuyuki Aihara,et al.  A large-scale study of link spam detection by graph algorithms , 2007, AIRWeb '07.

[11]  Wei Cai,et al.  Using the k-core decomposition to analyze the static structure of large-scale software systems , 2010, The Journal of Supercomputing.

[12]  William W. Wadge,et al.  Trust-Based Infinitesimals for Enhanced Collaborative Filtering , 2009, COMAD.

[13]  Alex Thomo,et al.  The 4 th International Conference on Ambient Systems , Networks and Technologies ( ANT 2013 ) LINK RECOMMENDER : Collaborative-Filtering for Recommending URLs to Twitter Users , 2013 .

[14]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[15]  Francesco De Pellegrini,et al.  Distributed k-Core Decomposition , 2013 .

[16]  Alex Thomo,et al.  Distributed evaluation of generalized path queries , 2005, SAC '05.

[17]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..

[18]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[19]  Alex Thomo,et al.  An experimental evaluation of giraph and GraphChi , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[20]  Alex Thomo,et al.  Trust prediction from user-item ratings , 2013, Social Network Analysis and Mining.

[21]  Yi Lu,et al.  Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation , 2014, Proc. VLDB Endow..

[22]  William W. Wadge,et al.  Harnessing the power of "favorites" lists for recommendation systems , 2009, RecSys '09.

[23]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[24]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[25]  Alex Thomo,et al.  Probabilistic Graph Summarization , 2013, WAIM.

[26]  Guy Kortsarz,et al.  Generating Sparse 2-Spanners , 1994, J. Algorithms.

[27]  William W. Wadge,et al.  Preferentially Annotated Regular Path Queries , 2007, ICDT.

[28]  Tim Weninger,et al.  Thinking Like a Vertex , 2015, ACM Comput. Surv..