Mapping Tightly-Coupled Applications on Volatile Resources

Platforms that comprise volatile processors, such as desktop grids, have been traditionally used for executing independent-task applications. In this work we study the scheduling of tightly-coupled iterative master-worker applications onto volatile processors. The main challenge is that workers must be simultaneously available for the application to make progress. We consider two additional complications: one should take into account that workers can become temporarily reclaimed and, for data-intensive applications, one should account for the limited bandwidth between the master and the workers. In this context, our first contribution is a theoretical study of the scheduling problem in its off-line version, i.e., when processor availability is known in advance. Even in this case the problem is NP-hard. Our second contribution is an analytical approximation of the expectation of the time needed by a set of workers to complete a set of tasks and of the probability of success of this computation. This approximation relies on a Markovian assumption for the temporal availability of processors. Our third contribution is a set of heuristics, some of which use the above approximation to favor reliable processors in a sensible manner. We evaluate these heuristics in simulation. We identify some heuristics that significantly outperform their competitors and derive heuristic design guidelines.

[1]  B. Bouteiller,et al.  MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[2]  Bora Uçar,et al.  Partitioning Sparse Matrices for Parallel Preconditioned Iterative Methods , 2007, SIAM J. Sci. Comput..

[3]  Viktor K. Prasanna,et al.  Adaptive Allocation of Independent Tasks to Maximize Throughput , 2007, IEEE Transactions on Parallel and Distributed Systems.

[4]  Richard Wolski,et al.  Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.

[5]  Henri Casanova,et al.  Probabilistic allocation of tasks on desktop grids , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[6]  Gerard L. G. Sleijpen,et al.  A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems , 1996, SIAM J. Matrix Anal. Appl..

[7]  Raphaël Couturier,et al.  Parallel Iterative Algorithms: From Sequential to Grid Computing (Chapman & Hall/crc Numerical Analy & Scient Comp. Series) , 2007 .

[8]  Douglas Thain,et al.  Challenges in Executing Data Intensive Biometric Workloads on a Desktop Grid , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[9]  Jean-Marc Vincent,et al.  Mining for statistical models of availability in large-scale distributed systems: An empirical study of SETI@home , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[10]  L G SleijpenGerard,et al.  A Jacobi--Davidson Iteration Method for Linear Eigenvalue Problems , 1996 .

[11]  Kenichi Hagihara,et al.  Computing Low Latency Batches with Unreliable Workers in Volunteer Computing Environments , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[12]  Gilles Fedak,et al.  BLAST Application with Data-Aware Desktop Grid Middleware , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[13]  Virginia Mary Lo,et al.  Wave Scheduler: Scheduling for Faster Turnaround Time in Peer-Based Desktop Grid Systems , 2005, JSSPP.

[14]  Milind Dawande,et al.  On Bipartite and Multipartite Clique Problems , 2001, J. Algorithms.

[15]  Yves Robert,et al.  Mapping and Load-Balancing Iterative Computations on Heterogeneous Clusters , 2003, PVM/MPI.

[16]  Chong-Sun Hwang,et al.  MJSA: Markov job scheduler based on availability in desktop grid computing environment , 2007, Future Gener. Comput. Syst..

[17]  Richard Wolski,et al.  An Analysis of Availability Distributions in Condor , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[18]  Kenichi Hagihara,et al.  Near-optimal dynamic task scheduling of independent coarse-grained tasks onto a computational grid , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[19]  Yair Censor,et al.  Component averaging: An efficient iterative parallel algorithm for large and sparse unstructured problems , 2001, Parallel Comput..

[20]  Andrew A. Chien,et al.  Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[21]  Kihong Park,et al.  Mapping parallel iterative algorithms onto workstation networks , 1994, Proceedings of 3rd IEEE International Symposium on High Performance Distributed Computing.

[22]  Gilles Fedak,et al.  Characterizing resource availability in enterprise desktop grids , 2007, Future Gener. Comput. Syst..

[23]  Gerard L. G. Sleijpen,et al.  A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems , 1996, SIAM Rev..

[24]  Richard Wolski,et al.  Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[25]  Henri Casanova,et al.  Scheduling Parallel Iterative Applications on Volatile Resources , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[26]  Aggelos K. Katsaggelos,et al.  A regularized iterative image restoration algorithm , 1991, IEEE Trans. Signal Process..

[27]  David P. Anderson,et al.  The Effectiveness of Threshold-Based Scheduling Policies in BOINC Projects , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).