Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids

Desktop grids have proved to be a suitable platform for the execution of bag-of-tasks applications but, being characterized by a high resource volatility, require the availability of scheduling techniques able to effectively deal with resource failures and/or unplanned periods of unavailability. In this paper we present a set of fault-aware scheduling policies that, rather than just tolerating faults as done by traditional fault-tolerant schedulers, exploit the information concerning resource availability to improve application performance. The performance of these strategies have been compared via simulation with those attained by traditional fault-tolerant schedulers. Our results, obtained by considering a set of realistic scenarios modeled after real desktop grids, show that our approach results in better application performance and resource utilization

[1]  Daniel Nurmi,et al.  Quantifying Machine Availability in Networked and Desktop Grid Systems , 2004 .

[2]  Peter A. Dinda Online prediction of the running time of tasks , 2001, SIGMETRICS '01.

[3]  Thomas Stricker,et al.  Implementation and characterization of protein folding on a desktop computational grid. Is CHARMM a suitable candidate for the United Devices MetaProcessor? , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[4]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[5]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[6]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[7]  Francisco Vilar Brasileiro,et al.  Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids , 2003, Euro-Par.

[8]  Richard Wolski,et al.  Predicting the CPU availability of time‐shared Unix systems on the computational grid , 2004, Cluster Computing.

[9]  Francine Berman,et al.  Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[10]  Francisco Brasileiro,et al.  Grid Computing for Bag of Tasks Applications , 2003 .

[11]  Francine Berman,et al.  The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[12]  Andrew A. Chien,et al.  Entropia: architecture and performance of an enterprise desktop grid system , 2003, J. Parallel Distributed Comput..

[13]  Jon B. Weissman,et al.  Fault Tolerant Scheduling in Distributed Networks , 2007 .

[14]  Cosimo Anglano,et al.  Fault-Tolerant Scheduling for Bag-of-Tasks Grid Applications , 2005, EGC.

[15]  W YoungJohn A first order approximation to the optimum checkpoint interval , 1974 .

[16]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[17]  Jemal H. Abawajy,et al.  Fault-tolerant scheduling policy for grid computing systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[18]  Richard Wolski,et al.  Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[19]  Andrew A. Chien,et al.  Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[20]  Andrew A. Chien,et al.  Henri Casanova , 2022 .

[21]  Richard Wolski,et al.  Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.