A probabilistic and adaptive scheduling algorithm using system-generated predictions for inter-grid resource sharing

Abstract Rapid advancement and more readily availability of Grid technologies have encouraged many businesses and researchers to establish Virtual Organizations (VO) and make use of their available desktop resources to solve computing intensive problems. These VOs, however, work as disjointed and independent communities with no resource sharing between them. We, in previous work, have proposed a fully decentralized and reconfigurable Inter-Grid framework for resource sharing among such distributed and autonomous Grid systems (Rao et al. in ICCSA, [2006]). The specific problem that underlies in such a collaborating Grids system is scheduling of resources as there is very little knowledge about availability of the resources due to the distributed and autonomous nature of the underlying Grid entities. In this paper, we propose a probabilistic and adaptive scheduling algorithm using system-generated predictions for Inter-Grid resource sharing keeping collaborating Grid systems autonomous and independent. We first use system-generated job runtime estimates without actually submitting jobs to the target Grid system. Then this job execution estimate is used to predict the job scheduling feasibility on the target system. Furthermore, our proposed algorithm adapted itself to the actual resource behavior and performance. Simulation results are presented to discuss the correctness and accuracy of our proposed algorithm.

[1]  Ciprian Dobre,et al.  MonALISA: An agent based, dynamic service system to monitor, control and optimize grid base applications , 2005 .

[2]  Peter A. Dinda Online prediction of the running time of tasks: Summary , 2001, SIGMETRICS 2001.

[3]  John F. Karpovich,et al.  The Legion Resource Management System , 1999, JSSPP.

[4]  Carla E. Brodley,et al.  Predictive application-performance modeling in a computational grid environment , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[5]  Rajkumar Buyya,et al.  A Cost-Aware Resource Exchange Mechanism for Load Management across Grids , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[6]  Ian T. Foster,et al.  Homeostatic and tendency-based CPU load predictions , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[7]  Rajkumar Buyya,et al.  A grid service broker for scheduling distributed data-oriented applications on global grids , 2004, MGC '04.

[8]  Wolfgang Gentzsch,et al.  Sun Grid Engine: towards creating a compute power grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[9]  Fredric Messing Predicting scheduling success , 1993 .

[10]  Hans-Joachim Hof,et al.  A Generic, Self-organizing, and Distributed Bootstrap Service for Peer-to-Peer Networks , 2007, IWSOS.

[11]  Y. Charlie Hu,et al.  A Self-Organizing Flock of Condors , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[12]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[13]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[14]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[15]  Francisco Vilar Brasileiro,et al.  Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids , 2004, JSSPP.

[16]  Chunming Hu,et al.  CGSP: An Extensible and Reconfigurable Grid Framework , 2005, APPT.

[17]  Lee C. Potter,et al.  Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[18]  Gabriel Mateescu Quality of Service on the Grid Via Metascheduling with Resource Co-Scheduling and Co-Reservation , 2003, Int. J. High Perform. Comput. Appl..

[19]  Jingwen Wang,et al.  Utopia: A load sharing facility for large, heterogeneous distributed computer systems , 1993, Softw. Pract. Exp..

[20]  Richard Gibbons,et al.  A Historical Application Profiler for Use by Parallel Schedulers , 1997, JSSPP.

[21]  Rajkumar Buyya,et al.  A Resource Exchange Mechanism for Peak Load Management in InterGrid Environments , 2007 .

[22]  Lingyun Yang,et al.  Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[23]  Dan Tsafrir,et al.  Backfilling Using System-Generated Predictions Rather than User Runtime Estimates , 2007, IEEE Transactions on Parallel and Distributed Systems.

[24]  Rajkumar Buyya,et al.  Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[25]  Youcef Derbal,et al.  A probabilistic scheduling heuristic for computational grids , 2006, Multiagent Grid Syst..

[26]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[27]  Amin Vahdat,et al.  Resource Allocation in Federated Distributed Computing Infrastructures , 2004 .

[28]  Peter A. Dinda,et al.  Online Prediction of the Running Time of Tasks , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[29]  P. Sadayappan,et al.  Distributed job scheduling on computational Grids using multiple simultaneous requests , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[30]  Sungyoung Lee,et al.  Distributed, Scalable and Reconfigurable Inter-grid Resource Sharing Framework , 2006, ICCSA.

[31]  Francisco Vilar Brasileiro,et al.  Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids , 2003, Euro-Par.

[32]  R. F. Freund,et al.  Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems , 1999, J. Parallel Distributed Comput..

[33]  Warren Smith,et al.  Benchmarks and Standards for the Evaluation of Parallel Job Schedulers , 1999, JSSPP.

[34]  Rajkumar Buyya,et al.  A model for cooperative federation of distributed clusters , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[35]  Akshai K. Aggarwal,et al.  An adaptive generalized scheduler for grid applications , 2005, 19th International Symposium on High Performance Computing Systems and Applications (HPCS'05).

[36]  Allen B. Downey Predicting queue times on space-sharing parallel computers , 1997, Proceedings 11th International Parallel Processing Symposium.

[37]  Robert L. Henderson,et al.  Job Scheduling Under the Portable Batch System , 1995, JSSPP.

[38]  Thu D. Nguyen,et al.  Using Runtime Measured Workload Characteristics in Parallel Processor Scheduling , 1996, JSSPP.

[39]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[40]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[41]  Xingfu Wu,et al.  Using Performance Prediction to Allocate Grid Resources , 2004 .

[42]  Francine Berman,et al.  The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[43]  Ciprian Dobre,et al.  MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems , 2009, Comput. Phys. Commun..

[44]  David Abramson,et al.  A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Brok , 2001, Future Gener. Comput. Syst..

[45]  Ming Wu,et al.  Grid Harvest Service: a system for long-term, application-level task scheduling , 2003, Proceedings International Parallel and Distributed Processing Symposium.