Design Dynamic Data Allocation Scheduler to Improve MapReduce Performance in Heterogeneous Clouds

This paper conducts a thorough research on one of the critical technologies in cloud computing, MapReduce programming model. Some of past research results showed that their methods can be executed through allocating identical tasks to each cloud node for enhancing MapReduce performance. However, such allocations are not applicable for the environment of heterogeneous cloud. Due to the different computing power and system resources between the nodes, such uniform distribution of tasks will lower the performance between nodes, and hence this paper makes improvement on the original speculative execution method of Hadoop and LATE Scheduler by proposing a new scheduling scheme known as Dynamic Data Allocation Scheduler (DDAS). DDAS adopts more accurate methods to determine the response time and backup task that affect the system, which is expected to enhance the success ratio of backup tasks and thereby to effectively increase the system ability to respond. Three different simulation experiments are performed and the using of DDAS scheme proves that that DDAS can reduce 30%, 18% and 21% of execution time relative to Hadoop. Also, the DDAS shows a more accurate speculative execution and reasonable allocation of backup tasks. Hence, DDAS can effectively enhance the performance of MapReduce processing in heterogeneous Cloud environment.

[1]  Xiaowei Liu,et al.  Multiple-Job Optimization in MapReduce for Heterogeneous Workloads , 2010, 2010 Sixth International Conference on Semantics, Knowledge and Grids.

[2]  Jin-Soo Kim,et al.  HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[3]  George Pallis,et al.  Cloud Computing: The New Frontier of Internet Computing , 2010, IEEE Internet Computing.

[4]  Aoying Zhou,et al.  Join Optimization in the MapReduce Environment for Column-wise Data Store , 2010, 2010 Sixth International Conference on Semantics, Knowledge and Grids.

[5]  Himabindu Pucha,et al.  Towards Optimizing Hadoop Provisioning in the Cloud , 2009, HotCloud.

[6]  P Visalakshi,et al.  MapReduce Scheduler Using Classifiers for Heterogeneous Workloads , 2011 .

[7]  Haibo Chen,et al.  Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Bo Yan,et al.  Beyond Hadoop: Recent Directions in Data Computing for Internet Services , 2011, Int. J. Cloud Appl. Comput..

[11]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[12]  Shin Gyu Kim,et al.  Improving MapReduce Performance by Exploiting Input Redundancy , 2011, J. Inf. Sci. Eng..

[13]  Chao Tian,et al.  A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[14]  Robert L. Grossman,et al.  Ieee Transactions on Parallel and Distributed Systems, Manuscript Id towards Efficient and Simplified Distributed Data Intensive Computing* , 2022 .

[15]  Wei Jiang,et al.  Comparing map-reduce and FREERIDE for data-intensive applications , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[16]  Quan Chen,et al.  SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[17]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[18]  Karthikeyan Sankaralingam,et al.  MapReduce for the Cell Broadband Engine Architecture , 2009, IBM J. Res. Dev..