An analytical performance model of MapReduce

MapReduce is a distributed computing framework. Its application in distributed systems is a rapidly emerging field. Although this framework can leverage clusters to improve computing performance, tuning it is still challenging. Most current works related to MapReduce performance are based on system monitoring and simulation, and lack analytical performance models. In this paper, we propose a simple and general MapReduce performance model for better understanding the impact of each component on overall program performance, and verify it in a small cluster. The results indicate that our model can predict the performance of MapReduce system and its relation to the configuration. According to our model, performance can be improved significantly by modifying Map split granularity and number of reducers without modifying the framework. The model also points out potential bottlenecks of the framework and future improvement for better performance.

[1]  George Kollios,et al.  MRShare , 2010, Proc. VLDB Endow..

[2]  Rajeev Gandhi,et al.  Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop , 2009, HotCloud.

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Rajeev Gandhi,et al.  An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[5]  Andy Konwinski,et al.  Chukwa: A large-scale monitoring system , 2008 .

[6]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[7]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[8]  Joanna Berlinska,et al.  Scheduling divisible MapReduce computations , 2011, J. Parallel Distributed Comput..

[9]  Guanying Wang,et al.  A simulation approach to evaluating design decisions in MapReduce setups , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[10]  Ludmila Cherkasova Performance modeling in mapreduce environments: challenges and opportunities , 2011, ICPE '11.

[11]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[12]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.