A load-aware scheduler for MapReduce framework in heterogeneous cloud environments

MapReduce is becoming a popular programming model for large-scale data processing in cloud computing environments. Hadoop MapReduce is the most popular open-source implementation of MapReduce framework. Hadoop MapReduce comes with a pluggable task scheduler interface as well as a default FIFO job scheduler. The default Hadoop scheduler only considers the homogeneous environments, and thus does not perform well in heterogenous environments. Although being proposed to schedule tasks/jobs in heterogenous environments, the LATE scheduler does not consider the phenomenon of dynamic loading which is common in practice. In view of this, we propose a new scheduler named Load-Aware scheduler, abbreviated as the LA scheduler, to address the problem resulting from the phenomenon of dynamic loading, thus being able to improve the overall performance of Hadoop clusters. Experimental results show that the LA scheduler is able to reduce up to 20% in average response time by avoiding unnecessary speculative tasks.

[1]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[3]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[4]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[7]  Jimeng Sun,et al.  DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[9]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).