ACO-HCO: Heuristic Performance Tuning Scheme for the Hadoop MapReduce Architecture

Hadoop MapReduce is a widely-used cloud computing technology for big data processing. However, the Hadoop configuration parameters settings can significantly change the execution performance. Manual adjustment of the Hadoop parameters will be a time consuming and difficult task. In this paper, we propose ACO-HCO, a Hadoop configuration tuning scheme for MapReduce applications. We use MapReduce applications job history records to generate specific job profiles. Based on these profiles, an objective function for execution time is constructed with gene expression programming algorithm by mining the correlation among the core Hadoop configuration parameters and input data size. Leveraging the objective function, an ACO-based configuration optimizer is able to heuristically search for the optimal configuration for a given application. Experimental results show that ACO-HCO enhances the performance of Hadoop significantly compared with the default configuration. Moreover, ACO-HCO performs better than heuristic approach and the cost-based model in Hadoop performance tuning.