Mapping applications on volatile resource

In this paper, we study the execution of iterative applications on volatile processors such as those found on desktop grids. We envision two models, one where all tasks are assumed to be independent, and another where all tasks are tightly coupled and keep exchanging information throughout the iteration. These two models cover the two extreme points of the parallelization spectrum. We develop master–worker scheduling schemes that attempt to achieve good trade-offs between worker speed and worker availability. Any iteration entails the execution of a fixed number of independent tasks or of tightly coupled tasks. A key feature of our approach is that we consider a communication model where the bandwidth capacity of the master for sending application data to workers is limited. This limitation makes the scheduling problem more difficult both in a theoretical sense and in a practical sense. Furthermore, we consider that a processor can be in one of three states: available, down, or temporarily preempted by its owner. This preempted state also complicates the scheduling problem. In practical settings, for example desktop grids, master bandwidth is limited and processors are temporarily reclaimed. Consequently, addressing the aforementioned difficulties is necessary for successfully deploying master–worker applications on volatile platforms. Our first contribution is to determine the complexity of the scheduling problems in their offline versions, that is, when processor availability behaviors are known in advance. Even with this knowledge, the problems are NP-hard. Our second contribution is an evaluation of the expectation of the time needed by a worker to complete a set of tasks. We obtain a close formula for independent tasks and an analytical approximation for tightly coupled tasks. Those evaluations rely on a Markovian assumption for the temporal availability of processors, and are at the heart of some heuristics that aim at favoring ‘reliable’ processors in a sensible manner. Our third contribution is a set of heuristics for both models, which we evaluate in simulation. Our results provide guidance in selecting the best strategy as a function of processor state availability vs average task duration.

[1]  Rudolf Eigenmann,et al.  Prediction of Resource Availability in Fine-Grained Cycle Sharing Systems Empirical Evaluation , 2007, Journal of Grid Computing.

[2]  David P. Anderson,et al.  The Effectiveness of Threshold-Based Scheduling Policies in BOINC Projects , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[3]  Raphaël Couturier,et al.  Parallel Iterative Algorithms: From Sequential to Grid Computing (Chapman & Hall/crc Numerical Analy & Scient Comp. Series) , 2007 .

[4]  Gerard L. G. Sleijpen,et al.  A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems , 1996, SIAM Rev..

[5]  Richard Wolski,et al.  Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[6]  Cosimo Anglano,et al.  Scheduling algorithms for multiple Bag-of-Task applications on Desktop Grids: A knowledge-free approach , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[7]  B. Bouteiller,et al.  MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[8]  Viktor K. Prasanna,et al.  Adaptive Allocation of Independent Tasks to Maximize Throughput , 2007, IEEE Transactions on Parallel and Distributed Systems.

[9]  Chong-Sun Hwang,et al.  MJSA: Markov job scheduler based on availability in desktop grid computing environment , 2007, Future Gener. Comput. Syst..

[10]  Gilles Fedak,et al.  BLAST Application with Data-Aware Desktop Grid Middleware , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[11]  William Gropp,et al.  MPICH2: A New Start for MPI Implementations , 2002, PVM/MPI.

[12]  Richard Wolski,et al.  An Analysis of Availability Distributions in Condor , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[13]  Kenichi Hagihara,et al.  Near-optimal dynamic task scheduling of independent coarse-grained tasks onto a computational grid , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[14]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[15]  Aggelos K. Katsaggelos,et al.  A regularized iterative image restoration algorithm , 1991, IEEE Trans. Signal Process..

[16]  C. Byrne,et al.  A unified treatment of some iterative algorithms in signal processing and image reconstruction , 2003 .

[17]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[18]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[19]  Virginia Mary Lo,et al.  Wave Scheduler: Scheduling for Faster Turnaround Time in Peer-Based Desktop Grid Systems , 2005, JSSPP.

[20]  Emilio Luque,et al.  Fault Tolerant Master-Worker over a Multi-Cluster Architecture , 2005, PARCO.

[21]  Kihong Park,et al.  Mapping parallel iterative algorithms onto workstation networks , 1994, Proceedings of 3rd IEEE International Symposium on High Performance Distributed Computing.

[22]  Kenichi Hagihara,et al.  Computing Low Latency Batches with Unreliable Workers in Volunteer Computing Environments , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[23]  Ching Y. Suen,et al.  A fast parallel algorithm for thinning digital patterns , 1984, CACM.

[24]  Christian Benjamin Ries Berkeley Open Infrastructure for Network Computing , 2012 .

[25]  Robert D. Nowak,et al.  Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[26]  Andrew A. Chien,et al.  Entropia: architecture and performance of an enterprise desktop grid system , 2003, J. Parallel Distributed Comput..

[27]  Gregor von Laszewski,et al.  A fault detection service for wide area distributed computations , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[28]  Thomas Hérault,et al.  MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[29]  Yves Robert,et al.  Mapping and load-balancing iterative computations , 2004, IEEE Transactions on Parallel and Distributed Systems.

[30]  T. Kailath,et al.  Array architectures for iterative algorithms , 1987, Proceedings of the IEEE.

[31]  Gregor von Laszewski,et al.  A fault detection service for wide area distributed computations , 2004, Cluster Computing.

[32]  Douglas Thain,et al.  Challenges in Executing Data Intensive Biometric Workloads on a Desktop Grid , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[33]  Jean-Marc Vincent,et al.  Mining for statistical models of availability in large-scale distributed systems: An empirical study of SETI@home , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[34]  Richard Wolski,et al.  Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.

[35]  Bora Uçar,et al.  Partitioning Sparse Matrices for Parallel Preconditioned Iterative Methods , 2007, SIAM J. Sci. Comput..

[36]  Fanny Dufossé Scheduling for Reliability : complexity and Algorithms , 2011 .

[37]  Emilio Luque,et al.  Efficient Execution of Scientific Computation on Geographically Distributed Clusters , 2004, PARA.

[38]  Loris Marchal,et al.  A Fair Decentralized Scheduler for Bag-of-Tasks Applications on Desktop Grids , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[39]  Henri Casanova,et al.  Probabilistic allocation of tasks on desktop grids , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[40]  Yair Censor,et al.  Component averaging: An efficient iterative parallel algorithm for large and sparse unstructured problems , 2001, Parallel Comput..

[41]  Jaspal Subhlok,et al.  VolpexMPI: An MPI Library for Execution of Parallel Applications on Volatile Nodes , 2009, PVM/MPI.

[42]  Jack J. Dongarra,et al.  FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World , 2000, PVM/MPI.

[43]  Ju-Ho Hyun An Effective Scheduling Method for More Reliable Execution on Desktop Grids , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[44]  Jacques M. Bahi,et al.  Java and asynchronous iterative applications: large scale experiments , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[45]  C.B.Ries UML for BOINC: A Modelling Language Approach for the Development of Distributed Applications based on the Berkeley Open Infrastructure for Network Computing , 2013 .

[46]  J. Strikwerda A probabilistic analysis of asynchronous iteration , 2002 .

[47]  Andrew A. Chien,et al.  Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[48]  Raphaël Couturier,et al.  JACEP2P-V2: A fully decentralized and fault tolerant environment for executing parallel iterative asynchronous applications on volatile distributed architectures , 2011, Future Gener. Comput. Syst..

[49]  Thomas Hérault,et al.  Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols , 2008, Future Gener. Comput. Syst..

[50]  Yves Robert,et al.  Mapping and Load-Balancing Iterative Computations on Heterogeneous Clusters , 2003, PVM/MPI.