Robust task scheduling for volunteer computing systems

Performance perturbations are a natural phenomenon in volunteer computing systems. Scheduling parallel applications with precedence-constraints is emerging as a new challenge in these systems. In this paper, we propose two novel robust task scheduling heuristics, which identify best task-resource matches in terms of makespan and robustness. Our approach for both heuristics is based on a proactive reallocation (or schedule expansion) scheme enabling output schedules to tolerate a certain degree of performance degradation. Schedules are initially generated by focusing on their makespan. These schedules are scrutinized for possible rescheduling using additional volunteer computing resources to increase their robustness. Specifically, their robustness is improved by maximizing either the total allowable delay time or the minimum relative allowable delay time over all allocated volunteer resources. Allowable delay times may occur due to precedence constraints. In this paper, two proposed heuristics are evaluated with an extensive set of simulations. Based on simulation results, our approach significantly contributes to improving the robustness of the resulting schedules.

[1]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[2]  Dharma P. Agrawal,et al.  Optimal Scheduling Algorithm for Distributed-Memory Machines , 1998, IEEE Trans. Parallel Distributed Syst..

[3]  Anthony A. Maciejewski,et al.  Measuring the Robustness of Resource Allocations in a Stochastic Dynamic Environment , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[4]  Albert Y. Zomaya,et al.  A Novel State Transition Method for Metaheuristic-Based Scheduling in Heterogeneous Computing Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[5]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[6]  Xiao Qin,et al.  A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters , 2005, J. Parallel Distributed Comput..

[7]  Anthony A. Maciejewski,et al.  Static allocation of resources to communicating subtasks in a heterogeneous ad hoc grid environment , 2006, J. Parallel Distributed Comput..

[8]  Atakan Dogan,et al.  Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[9]  Yves Robert,et al.  Fault tolerant scheduling of precedence task graphs on heterogeneous platforms , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[10]  Anthony A. Maciejewski,et al.  Dynamic resource allocation heuristics that manage tradeoff between makespan and robustness , 2007, The Journal of Supercomputing.

[11]  Albert Y. Zomaya,et al.  Genetic Scheduling for Parallel Processor Systems: Comparative Studies and Performance Issues , 1999, IEEE Trans. Parallel Distributed Syst..

[12]  Anthony A. Maciejewski,et al.  Mapping subtasks with multiple versions on an ad hoc grid , 2005, Parallel Comput..

[13]  S. P. Kumar,et al.  Solving Linear Algebraic Equations on an MIMD Computer , 1983, JACM.

[14]  Anthony A. Maciejewski,et al.  Robust static allocation of resources for independent tasks under makespan and dollar cost constraints , 2007, J. Parallel Distributed Comput..

[15]  Emmanuel Jeannot,et al.  Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems , 2007, SPAA '07.

[16]  Anthony A. Maciejewski,et al.  Static heuristics for robust resource allocation of continuously executing applications , 2008, J. Parallel Distributed Comput..

[17]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[18]  Chong-Sun Hwang,et al.  Scheduling Scheme based on Dedication Rate in Volunteer Computing Environment , 2005, The 4th International Symposium on Parallel and Distributed Computing (ISPDC'05).

[19]  Filip De Turck,et al.  Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids , 2009, IEEE Transactions on Parallel and Distributed Systems.

[20]  Kalyanmoy Deb,et al.  Introducing Robustness in Multi-Objective Optimization , 2006, Evolutionary Computation.

[21]  Albert Y. Zomaya,et al.  On the Performance of a Dual-Objective Optimization Model for Workflow Applications on Grid Platforms , 2009, IEEE Transactions on Parallel and Distributed Systems.

[22]  Anthony A. Maciejewski,et al.  Static resource allocation for heterogeneous computing environments with tasks having dependencies, priorities, deadlines, and multiple versions , 2008, J. Parallel Distributed Comput..

[23]  Anthony A. Maciejewski,et al.  Stochastic robustness metric and its use for static resource allocations , 2008, J. Parallel Distributed Comput..

[24]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.