Using Simulation, Historical and Hybrid Estimation Systems for Enhacing Job Scheduling on NOWs

The computation capacity of the workstations in an open laboratory is enough to execute not only the local workload but some distributed computation. Unfortunately, the local workload introduces much uncertainty into the predictability of the system, which hinders the applicability of the job scheduling strategies. In this work, we introduce an estimation engine into our job scheduling system, termed CISNE. This prediction capacity allows us guarantee some limits to the turnaround time of parallel jobs. With this aim, three different estimation methods have been proposed and implemented in the CISNE system: a simulation tool, a historical system and an integration of both (hybrid). In this framework, we have compared our proposals to representative estimation methods in the literature. Likewise, we have analyzed these estimation methods in relation to different scheduling policies. These results reveal that the hybrid method achieves the best performance due to the fact that it combines the flexibility of a simulator to represent such a dynamic system as a non-dedicated cluster together with the accuracy given by the historical information

[1]  Angela C. Sodan,et al.  ScoPred-Scalable User-Directed Performance Prediction Using Complexity Modeling and Historical Data , 2005, JSSPP.

[2]  Richard Gibbons,et al.  A Historical Application Profiler for Use by Parallel Schedulers , 1997, JSSPP.

[3]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[4]  Emilio Luque,et al.  Cooperating Coscheduling in a Non-dedicated Cluster , 2003, Euro-Par.

[5]  Anand Sivasubramaniam,et al.  An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration , 2001, JSSPP.

[6]  Warren Smith,et al.  Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance , 1999, JSSPP.

[7]  Anand Sivasubramaniam,et al.  An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration , 2001, IEEE Trans. Parallel Distributed Syst..

[8]  Warren Smith,et al.  Resource Selection Using Execution and Queue Wait Time Predictions , 2002 .

[9]  Subhash Saini,et al.  Performance prediction and its use in parallel and distributed computing systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[10]  Prashant J. Shenoy,et al.  Sharc: managing CPU and network bandwidth in shared clusters , 2004, IEEE Transactions on Parallel and Distributed Systems.

[11]  Stephen A. Jarvis,et al.  Dynamic, capability-driven scheduling of DAG-based real-time jobs in heterogeneous clusters , 2004, Int. J. High Perform. Comput. Netw..

[12]  Dror G. Feitelson,et al.  Packing Schemes for Gang Scheduling , 1996, JSSPP.

[13]  Graham R. Nudd,et al.  PACE: A Toolset to Investigate and Predict Performance in Parallel Systems , 1996 .

[14]  Sanjeev Setia,et al.  Availability and utility of idle memory in workstation clusters , 1999, SIGMETRICS '99.

[15]  Allen B. Downey Predicting queue times on space-sharing parallel computers , 1997, Proceedings 11th International Parallel Processing Symposium.

[16]  Mauricio Hanzich,et al.  CISNE: A New Integral Approach for Scheduling Parallel Applications on Non-dedicated Clusters , 2005, Euro-Par.

[17]  Ian Foster,et al.  Predicting application run times with historical information , 2004, J. Parallel Distributed Comput..

[18]  Dror G. Feitelson,et al.  Backfilling with Lookahead to Optimize the Performance of Parallel Job Scheduling , 2003, JSSPP.

[19]  Peter A. Dinda,et al.  Online Prediction of the Running Time of Tasks , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[20]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[21]  Mauricio Hanzich,et al.  Coscheduling and Multiprogramming Level in a Non-dedicated Cluster , 2004, PVM/MPI.

[22]  Richard Wolski,et al.  Experiences with predicting resource performance on-line in computational grid settings , 2003, PERV.

[23]  Lingyun Yang,et al.  Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[24]  Hui Li,et al.  Predicting job start times on clusters , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[25]  Dror G. Feitelson,et al.  Improving and stabilizing parallel computer performance using adaptive backfilling , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.