Predicting job start times on clusters

In a computational Grid which consists of many computer clusters, job start time predictions are useful to guide resource selections and balance the workload distribution. However, the basic Grid middleware available today either has no means of expressing the time that a site will take before starting a job or uses a simple linear scale. In this paper we introduce a system for predicting job start times on clusters. Our predictions are based on statistical analysis of historical job traces and simulation of site schedulers. We have deployed the system on the EDG (European Data-Grid) production cluster at NIKHEF. The experimental results show that acceptable prediction accuracy is achieved to reflect real site states and site-specific scheduling policies. We find that the average error of our job start time predictions is 18.9 percent of the average job queue wait time and this is around 20 times smaller than the average prediction error using the EDG solution.

[1]  J. Cornell Introductory Mathematical Statistics: Principles and Methods , 1970 .

[2]  Richard Gibbons,et al.  A Historical Application Profiler for Use by Parallel Schedulers , 1997, JSSPP.

[3]  Allen B. Downey Predicting queue times on space-sharing parallel computers , 1997, Proceedings 11th International Parallel Processing Symposium.

[4]  Warren Smith,et al.  Predicting Application Run Times Using Historical Information , 1998, JSSPP.

[5]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[6]  Peter A. Dinda,et al.  An evaluation of linear models for host load prediction , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[7]  Warren Smith,et al.  Resource Selection Using Execution and Queue Wait Time Predictions , 2002 .

[8]  A. Karimi,et al.  Master‟s thesis , 2011 .