Benchmarking a MapReduce Environment on a Full Virtualisation Platform

This work analyses the performance of Hadoop, an implementation of the MapReduce programming model for distributed parallel computing, executing on a virtualisation environment comprised of 1+16 nodes running the VMWare workstation software. A set of experiments using the standard Hadoop benchmarks has been designed in order to determine whether or not significant reductions in the execution time of computations are experienced using Hadoop on this virtualisation platform on a local area network. Our findings indicate that a significant decrease in computing times is observed under these conditions. They also highlight how overheads and virtualisation in a distributed environment hinder the possibility of achieving the maximum (peak) performance.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Jeffrey P. Buzen,et al.  The evolution of virtual machine architecture , 1973, AFIPS National Computer Conference.

[3]  Horacio González-Vélez,et al.  Adaptive statistical scheduling of divisible workloads in heterogeneous systems , 2010, J. Sched..

[4]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[5]  Thomas G. Robertazzi,et al.  Ten Reasons to Use Divisible Load Theory , 2003, Computer.

[6]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[7]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[8]  Horacio González-Vélez,et al.  An Adaptive Skeletal Task Farm for Grids , 2005, Euro-Par.

[9]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[10]  Chandra Krintz,et al.  Paravirtualization for HPC Systems , 2006, ISPA Workshops.

[11]  Horacio González-Vélez,et al.  Self-adaptive skeletal task farm for computational grids , 2006, Parallel Comput..

[12]  Herbert Kuchen,et al.  Features from functional programming for a C++ skeleton library , 2005, Concurr. Pract. Exp..

[13]  Thomas Sandholm,et al.  MapReduce optimization using regulated dynamic prioritization , 2009, SIGMETRICS '09.

[14]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[15]  Horacio González-Vélez,et al.  Adaptive structured parallelism for distributed heterogeneous architectures: a methodological approach with pipelines and farms , 2010, Concurr. Comput. Pract. Exp..

[16]  Marco Danelutto Adaptive task farm implementation strategies , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..

[17]  S. Gribble,et al.  Scale and performance in the Denali isolation kernel , 2002, OSDI '02.

[18]  Marco Danelutto,et al.  SkIE: A heterogeneous environment for HPC applications , 1999, Parallel Comput..

[19]  Christian Engelmann,et al.  Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.