An evaluation of linear models for host load prediction

Evaluates linear models for predicting the Digital Unix five-second host load average from 1 to 30 seconds into the future. A detailed statistical study of a large number of long, fine-grain load traces from a variety of real machines leads to consideration of the Box-Jenkins (1994) models (AR, MA, ARMA, ARIMA), and the ARFIMA (autoregressive fractional integrated moving average) models (due to self-similarity). These models, as well as a simple windowed-mean scheme, are then rigorously evaluated by running a large number of randomized test cases on the load traces and by data-mining their results. The main conclusions are that the load is consistently predictable to a very useful degree, and that the simpler models, such as AR, are sufficient for performing this prediction.

[1]  Teunis J. Ott,et al.  Load-balancing heuristics and process behavior , 1986, SIGMETRICS '86/PERFORMANCE '86.

[2]  John A. Zinky,et al.  Architectural Support for Quality of Service for CORBA Objects , 1997, Theory Pract. Object Syst..

[3]  Hung Man Tong,et al.  Threshold models in non-linear time series analysis. Lecture notes in statistics, No.21 , 1983 .

[4]  Peter A. Dinda,et al.  The Case for Prediction-Based Best-Effort Real-Time Systems , 1999, IPPS/SPDP Workshops.

[5]  A. Raftery,et al.  Space-time modeling with long-memory dependence: assessing Ireland's wind-power resource. Technical report , 1987 .

[6]  Edward D. Lazowska,et al.  The limited performance benefits of migrating active processes for load sharing , 1988, SIGMETRICS 1988.

[7]  Donald B. Percival,et al.  Fractal structures and processes , 1996 .

[8]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[9]  Adrian E. Raftery,et al.  [Statistical Methods for Data with Long-Range Dependence]: Comment: Computational Aspects of Fractionally Differenced ARIMA Modeling for Long- Memory Time Series , 1992 .

[10]  Edward D. Lazowska,et al.  Adaptive load sharing in homogeneous distributed systems , 1986, IEEE Transactions on Software Engineering.

[11]  Francine Berman,et al.  Scheduling from the perspective of the application , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[12]  Peter A. Dinda The Statistical Properties of Hoast Load , 1998, LCR.

[13]  Katia Obraczka,et al.  The performance of a service for network-aware applications , 1998, SPDT '98.

[14]  R. Wolski,et al.  Predicting the CPU availability of time‐shared Unix systems on the computational grid , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[15]  J. R. M. Hosking,et al.  FRACTIONAL DIFFERENCING MODELING IN HYDROLOGY , 1985 .

[16]  Amarnath Mukherjee,et al.  Time series models for internet traffic , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[17]  Miron Livny,et al.  The Available Capacity of a Privately Owned Workstation Environmont , 1991, Perform. Evaluation.

[18]  Peter A. Dinda,et al.  The statistical properties of host load , 1999, Sci. Program..

[19]  Henry D. I. Abarbanel,et al.  Analysis of Observed Chaotic Data , 1995 .

[20]  W. Willinger,et al.  ESTIMATORS FOR LONG-RANGE DEPENDENCE: AN EMPIRICAL STUDY , 1995 .

[21]  C. Granger,et al.  AN INTRODUCTION TO LONG‐MEMORY TIME SERIES MODELS AND FRACTIONAL DIFFERENCING , 1980 .

[22]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[23]  J. Beran Statistical methods for data with long-range dependence , 1992 .

[24]  Peter A. Dinda The Statistical Properties of Host Load (Extended Version) , 1999 .

[25]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[26]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[27]  Edward D. Lazowska,et al.  The limited performance benefits of migrating active processes for load sharing , 1988, SIGMETRICS '88.

[28]  Peter A. Dinda,et al.  Preliminary Report on the Design of a Framework for Distributed Visualization , 1999, PDPTA.

[29]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[30]  Thomas R. Gross,et al.  ReMoS: A Resource Monitoring System for Network-Aware Applications , 1997 .

[31]  H. E. Hurst,et al.  Long-Term Storage Capacity of Reservoirs , 1951 .

[32]  George C. Polyzos,et al.  A time series model of long-term NSFNET backbone traffic , 1994, Proceedings of ICC/SUPERCOMM'94 - 1994 International Conference on Communications.

[33]  David R. O'Hallaron,et al.  Languages, Compilers and Run-Time Systems for Scalable Computers , 1998, Springer US.