Predicting CPU Availability on the Computational Grid Using the Network Weather Service

In this paper, we focus on the problem of predicting CPU availability for Computational Grid settings in which individual machines may be either time-shared or batch-controlled. We use the Network Weather Service — a distributed system that monitors and forecasts resource performance in Computational Grid environments — to measure and predict CPU availability. We examine the accurancy with which CPU availability can be predicted in an interactive cluster computing environment under production load conditions, and compare these results with a similar study of a production batch system. Our work shows that in the environments we have studied, the availability of clusted interactive resources is significantly more predictable that of the batch system.

[1]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[2]  Henri Casanova,et al.  NetSovle: A Network Server for Solving Computational Science Problems , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[3]  Allen B. Downey Predicting queue times on space-sharing parallel computers , 1997, Proceedings 11th International Parallel Processing Symposium.

[4]  Richard Wolski,et al.  Implementing a Performance Forecasting System for Metacomputing The Network Weather Service , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[5]  Geoffrey C. Fox,et al.  HPJava: data parallel extensions to Java , 1998 .

[6]  Alexander Reinefeld,et al.  MARS - A framework for minimizing the job execution time in a metacomputing environment , 1996, Future Gener. Comput. Syst..

[7]  Andrew S. Grimshaw,et al.  A framework for partitioning parallel computations in heterogeneous environments , 1995, Concurr. Pract. Exp..

[8]  Francine Berman,et al.  Modeling the effects of contention on the performance of heterogeneous applications , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[9]  Warren Smith,et al.  Predicting Application Run Times Using Historical Information , 1998, JSSPP.

[10]  R. Wolski,et al.  Predicting the CPU availability of time‐shared Unix systems on the computational grid , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[11]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[12]  James C. French,et al.  Legion: The Next Logical Step Toward a Nationwide Virtual Computer , 1994 .

[13]  Andrea C. Arpaci-Dusseau,et al.  Parallel computing on the berkeley now , 1997 .

[14]  Honbo Zhou,et al.  The EASY - LoadLeveler API Project , 1996, JSSPP.

[15]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[16]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[17]  Francine Berman,et al.  Performance prediction in production environments , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[18]  Andrew A. Chien,et al.  Exploring Structured Adaptive Mesh Refinement (SAMR) Methods with the Illinois Concert System , 1997, PPSC.

[19]  C. Granger,et al.  Forecasting Economic Time Series. , 1988 .