MapReduce Performance Evaluation on a Private HPC Cloud

The convergence of accessible cloud computing resources and big data trends have introduced unprecedented opportunities for scientific computing and discovery. However, HPC cloud users face many challenges when selecting valid HPC configurations. In this paper, we report a set of performance evaluations of data intensive benchmarks on a private HPC cloud to help with the selection of such configurations. More precisely, we study the effect of virtual machines core-count on the performance of 3 benchmarks widely used by the MapReduce community. We notice that depending on the computation to communication ratios of the studied applications, using higher core-counts virtual machines do not always lead to higher performance for data-intensive applications.

[1]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[2]  Sean Owen,et al.  Mahout in Action , 2011 .