Proposal : Software Auto-Tuning for MapReduce

MapReduce-based utility computing has become widespread. Performance tuning of MapReduce frameworks, such as Hadoop, is notoriously difficult since users lack expertise or access to critical configuration options. We propose an auto-tuning approach for MapReduce applications running on Hadoop clusters, based on supervised learning. With auto-tuning, we aim to provide performance that is significantly better than using the default Hadoop parameters, and within 5% of the best performance that can be obtained by parameter-space search techniques. Research Problem: to use statistical machine learning to auto-tune MapReduce applications running on the Hadoop platform

[1]  R. Rodrigues,et al.  Conductor: orchestrating the clouds , 2010, LADIS '10.

[2]  Himabindu Pucha,et al.  Towards Optimizing Hadoop Provisioning in the Cloud , 2009, HotCloud.

[3]  Gavin Brown,et al.  Intelligent selection of application-specific garbage collectors , 2007, ISMM '07.

[4]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[5]  Guanying Wang,et al.  A simulation approach to evaluating design decisions in MapReduce setups , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[6]  Portable Compiler Optimization Across Embedded Programs and Microarchitectures using Machine Learning , 2008 .

[7]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.

[8]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[9]  Shivnath Babu,et al.  Towards automatic optimization of MapReduce programs , 2010, SoCC '10.

[10]  Michael F. P. O'Boyle,et al.  Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[11]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[12]  Michael F. P. O'Boyle,et al.  Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.

[13]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.