Stochastic Tail-Phase Optimization for Bag-of-Tasks Execution in Clouds

Elastic applications like bags of tasks benefit greatly from Infrastructure as a Service (IaaS) clouds that let users allocate compute resources on demand, charging based on reserved time intervals. Users, however, still need guidance for mapping their applications onto multiple IaaS offerings, both minimizing execution time and respecting budget limitations. For budget-controlled execution of bags of tasks, we built Bats, a scheduler that estimates possible budget and make spancombinations using a tiny task sample, and then executes a bag within the user's budget constraints. Previous work has shown the efficacy of this approach. There remains, however, the risk of outlier tasks causing the execution to exceed the predicted make span. In this work, we present a stochastic optimization of the tail phase for Bats' execution. The main idea is to use the otherwise idling machines up until the end of their (already paid-for) allocation time. Using the task completion time information acquired during the execution, BaTS decides which tasks to replicate onto idle machines in the tail phase, reducing the make span and improving the tolerance to outlier tasks. Our evaluation results show that this effect is robust w.r.t. the quality of runtime predictions and is the strongest with more expensive schedules in which many fast machines are available.

[1]  Thilo Kielmann,et al.  Bag-of-Tasks Scheduling under Budget Constraints , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[2]  Jan Broeckhove,et al.  Runtime Prediction Based Grid Scheduling of Parameter Sweep Jobs , 2008, 2008 IEEE Asia-Pacific Services Computing Conference.

[3]  Dick H. J. Epema,et al.  A Realistic Integrated Model of Parallel System Workloads , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[4]  Xiao Liu,et al.  A Compromised-Time-Cost Scheduling Algorithm in SwinDeW-C for Instance-Intensive Cost-Constrained Workflows on a Cloud Computing Platform , 2010, Int. J. High Perform. Comput. Appl..

[5]  Henri Casanova,et al.  Probabilistic allocation of tasks on desktop grids , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[6]  Thilo Kielmann,et al.  Budget Estimation and Control for Bag-of-Tasks Scheduling in Clouds , 2011, Parallel Process. Lett..

[7]  Alexandru Iosup,et al.  ExPERT: Pareto-Efficient Task Replication on Grids and a Cloud , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[8]  Andrew A. Chien,et al.  Scheduling Task Parallel Applications for Rapid Turnaround on Enterprise Desktop Grids , 2007, Journal of Grid Computing.

[9]  Daniel S. Katz,et al.  Scheduling many-task workloads on supercomputers: Dealing with trailing tasks , 2010, 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers.

[10]  Gilles Fedak,et al.  SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures , 2012, HPDC '12.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Wei Lu,et al.  Performing Large Science Experiments on Azure: Pitfalls and Solutions , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[13]  Jie Li,et al.  Cloud auto-scaling with deadline and budget constraints , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[14]  Lee C. Potter,et al.  Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).