Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications

MapReduce is one of the leading programming frameworks to implement data-intensive applications by splitting the map and reduce tasks to distributed servers. Although there has been substantial amount of work on map task scheduling and optimization in the literature, the work on reduce task scheduling is very limited. Effective scheduling of the reduce tasks to the resources becomes especially important for the performance of data-intensive applications where large amounts of data are moved between the map and reduce tasks. In this paper, we propose a new algorithm (LoNARS) for reduce task scheduling, which takes both data locality and network traffic into consideration. Data locality awareness aims to schedule the reduce tasks closer to the map tasks to decrease the delay in data access as well as the amount of traffic pushed to the network. Network traffic awareness intends to distribute the traffic over the whole network and minimize the hotspots to reduce the effect of network congestion in data transfers. We have integrated LoNARS into Hadoop-1.2.1. Using our LoNARS algorithm, we achieved up to 15% gain in data shuffling time and up to 3-4% improvement in total job completion time compared to the other reduce task scheduling algorithms. Moreover, we reduced the amount of traffic on network switches by 15% which helps to save energy consumption considerably.

[1]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[2]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[3]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[4]  Sujata Banerjee,et al.  A Power Benchmarking Framework for Network Devices , 2009, Networking.

[5]  Mohammad Hammoud,et al.  Locality-Aware Reduce Task Scheduling for MapReduce , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[8]  Yuhong Feng,et al.  An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments , 2011, 2011 International Conference on Cloud and Service Computing.

[9]  Mohammad Hammoud,et al.  Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[10]  Seung Ryoul Maeng,et al.  Locality-aware dynamic VM reconfiguration on MapReduce clouds , 2012, HPDC '12.