论文信息 - An evaluation study of BigData frameworks for graph processing

An evaluation study of BigData frameworks for graph processing

When Google first introduced the Map/Reduce paradigm in 2004, no comparable system had been available to the general public. The situation has changed since then. The Map/Reduce paradigm has become increasingly popular and there is no shortage of Map/Reduce implementations in today's computing world. The predominant solution is currently Apache Hadoop, started by Yahoo. Besides employing custom Map/Reduce installations, customers of cloud services can now exploit ready-made made installations (e.g. the Elastic Map/Reduce System). In the mean time, other, second generation frameworks have started to appear. They either fine tune the Map/Reduce model for specific scenarios, or change the paradigm altogether, such as Google's Pregel. In this paper, we present a comparison between these second generation frameworks and the current de-facto standard Hadoop, by focusing on a specific scenario: large-scale graph analysis. We analyze the different means of fine-tuning those systems by exploiting their unique features. We base our analysis on the k-core decomposition problem, whose goal is to compute the centrality of each node in a given graph; we tested our implementation in a cluster of Amazon EC2 nodes with realistic datasets made publicly available by the SNAP project.

Alberto Montresor | Benedikt Elser | A. Montresor | Benedikt Elser

[1] Stephen B. Seidman,et al. Network structure and minimum degree , 1983 .

[2] Alessandro Vespignani,et al. Large scale networks fingerprinting and visualization using the k-core decomposition , 2005, NIPS.

[3] Jonathan W. Berry,et al. Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[4] Silvio Lattanzi,et al. Filtering: a method for solving graph problems in MapReduce , 2011, SPAA '11.

[5] Gary D Bader,et al. Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[6] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[8] Volker Markl,et al. Spinning Fast Iterative Data Flows , 2012, Proc. VLDB Endow..

[9] Francesco De Pellegrini,et al. General , 1895, The Social History of Alcohol Review.

[10] Jignesh M. Patel,et al. A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[11] Steven Hand,et al. The Seven Deadly Sins of Cloud Computing Research , 2012, HotCloud.

[12] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[13] Abhinandan Das,et al. Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[14] Jonathan Cohen,et al. Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[15] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[16] Astrid Rheinländer,et al. Opening the Black Boxes in Data Flow Optimization , 2012, Proc. VLDB Endow..

[17] Jin-Soo Kim,et al. HAMA: An Efficient Matrix Computation with the MapReduce Framework , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[18] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[19] Douglas Stott Parker,et al. Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[20] Jimmy J. Lin,et al. Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.