On Scheduling Algorithms for MapReduce Jobs in Heterogeneous Clouds with Budget Constraints

In this paper, we consider task-level scheduling algorithms with respect to budget constraints for a bag of MapReduce jobs on a set of provisioned heterogeneous (virtual) machines in cloud platforms. The heterogeneity is manifested in the popular "pay-as-you-go" charging model where the service machines with different performance would have different service rates. We organize a bag of jobs as a i¾?-stage workflow and consider the scheduling problem with budget constraints. In particular, given a total monetary budget, by combining a greedy-based local optimal algorithm and dynamic programming techniques, we first propose a global optimal scheduling algorithm to achieve a minimum scheduling length of the workflow in pseudo-polynomial time. Then, we extend the idea in the greedy algorithm to efficient global distribution of the budget among the tasks in different stages for overall scheduling length reduction. Our empirical studies verify the proposed optimal algorithm and show the efficiency of the greedy algorithm to minimize the scheduling length.

[1]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[2]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[3]  Ying Li,et al.  A Power-Aware Scheduling of MapReduce Applications in the Cloud , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[4]  Eddy Caron,et al.  Budget Constrained Resource Allocation for Non-deterministic Workflows on an IaaS Cloud , 2012, ICA3PP.

[5]  Miguel Correia,et al.  On the Feasibility of Byzantine Fault-Tolerant MapReduce in Clouds-of-Clouds , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[6]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[7]  Huan Liu,et al.  Cloud MapReduce: A MapReduce Implementation on Top of a Cloud Operating System , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[8]  Seung-Jong Park,et al.  Network-aware scheduling of mapreduce framework ondistributed clusters over high speed networks , 2012, FederatedClouds '12.

[9]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[10]  G. Bruce Berriman,et al.  An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2 , 2012, Journal of Grid Computing.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Hai Jin,et al.  Evaluating MapReduce on Virtual Machines: The Hadoop Case , 2009, CloudCom.

[13]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[14]  Rajkumar Buyya,et al.  Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms , 2006, Sci. Program..

[15]  Domenico Talia,et al.  Enabling Reliable MapReduce Applications in Dynamic Cloud Infrastructures , 2010, ERCIM News.

[16]  Bo Yang,et al.  Automatic task slots assignment in Hadoop MapReduce , 2011, ASBD '11.

[17]  Peter A. N. Bosman,et al.  A Computational Approach to Patient Flow Logistics in Hospitals , 2010, ERCIM News.

[18]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[19]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[20]  Xiaorong Li,et al.  ScaleStar: Budget Conscious Scheduling Precedence-Constrained Many-task Workflow Applications in Cloud , 2012, 2012 IEEE 26th International Conference on Advanced Information Networking and Applications.

[21]  IEEE 26th International Conference on Advanced Information Networking and Applications, AINA, 2012 , Fukuoka, Japan, March 26-29, 2012 , 2012, AINA.

[22]  Jiun-Long Huang,et al.  A load-aware scheduler for MapReduce framework in heterogeneous cloud environments , 2011, SAC '11.