Advantages of Giraph over Hadoop in Graph Processing

This article presents a comparison of the computing performance of the MapReduce tool Hadoop and Giraph on large-scale graphs. The main ideas of MapReduce and bulk synchronous parallel (BSP) are reviewed as big data computing approaches to highlight their applicability in large-scale graph processing. This paper reviews the execution performance of Hadoop and Giraph on the PageRank algorithm to classify web pages according to their relevance, and on a few other algorithms to find the minimum spanning tree in a graph with the primary goal of finding the most efficient computing approach to work on large-scale graphs. Experimental results show that the use of Giraph for processing large-size graphs reduces the execution time by 25% in comparison with the results obtained using the Hadoop for the same experiments. Giraph represents the optimal option thanks to its in-memory computing approach that avoids secondary memory direct interaction.

[1]  Kostas Tzoumas,et al.  Introduction to Apache Flink: Stream Processing for Real Time and Beyond , 2016 .

[2]  R. Prim Shortest connection networks and some generalizations , 1957 .

[3]  Stefan Papp The Definitive Guide to Apache Flink: Next Generation Data Processing , 2016 .

[4]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[5]  Richard M. Karp,et al.  The traveling-salesman problem and minimum spanning trees: Part II , 1971, Math. Program..

[6]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Reynold Xin,et al.  Apache Spark , 2016 .

[9]  GhemawatSanjay,et al.  The Google file system , 2003 .

[10]  Jennifer Widom,et al.  Graft: A Debugging Tool For Apache Giraph , 2015, SIGMOD Conference.

[11]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[12]  Vandana Ahuja,et al.  The internet of things - new value streams for customers , 2017, Int. J. Inf. Technol. Manag..

[13]  Claudio Martella,et al.  Practical Graph Analytics with Apache Giraph , 2015, Apress.

[14]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[15]  Khuzaima Daudjee,et al.  Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems , 2015, Proc. VLDB Endow..

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.