Implementing a Performance Forecasting System for Metacomputing The Network Weather Service

In this paper we describe the design and implementation of a system called the Network Weather Service (NWS) that takes periodic measurements of deliverable resource performance from distributed networked resources, and uses numerical models to dynamically generate forecasts of future performance levels. These performance forecasts, along with measures of performance fluctuation (e.g the mean square prediction error) and forecast lifetime that the NWS generates, are made available to schedulers and other resource management mechanisms at runtime so that they may determine the quality-of-service that will be available from each resource. We describe the architecture of the NWS and implementations that we have developed and are currently deploying for the Legion [13] and Globus/Nexus [7] metacomputing infrastructures. We also detail NWS forecasts of resource performance using both the Legion and Globus/Nexus implementations. Our results show that simple forecasting techniques substantially outperform measurements of current conditions (commonly used to gauge resource availability and load) in terms of prediction accuracy. In addition, the techniques we have employed are almost as accurate as substantially more complex modeling methods. We compare our techniques to a sophisticated time-series analysis system in terms of forecasting accuracy and computational complexity.

[1]  Charbel Farhat,et al.  Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics , 1993 .

[2]  Francine Berman,et al.  Modeling the effects of contention on the performance of heterogeneous applications , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[3]  Francine Berman,et al.  Scheduling from the perspective of the application , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[4]  Ravishankar K. Iyer,et al.  Predictability of Process Resource Usage: A Measurement-Based Study on UNIX , 1989, IEEE Trans. Software Eng..

[5]  A. Gallant,et al.  Seminonparametric Estimation Of Conditionally Constrained Heterogeneous Processes: Asset Pricing Applications , 1989 .

[6]  Charbel Farhat,et al.  Implicit parallel processing in structural mechanics , 1994 .

[7]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[8]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .

[9]  Charbel Farhat Multiprocessors in computational mechanics , 1987 .

[10]  George Tauchen,et al.  SNP: A Program for Nonparametric Time Series Analysis. Version 8.4. User's Guide , 1995 .

[11]  Stephen Taylor,et al.  Forecasting Economic Time Series , 1979 .

[12]  C. Farhat,et al.  The two-level FETI method for static and dynamic plate problems Part I: An optimal iterative solver for biharmonic systems , 1998 .

[13]  Murad S. Taqqu,et al.  On the Self-Similar Nature of Ethernet Traffic , 1993, SIGCOMM.

[14]  Stéphane Lanteri,et al.  TOP/DOMDEC : a software tool for mesh partitioning and parallel processing and applications to CSM a , 1995 .

[15]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[16]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[17]  C. Farhat,et al.  A method of finite element tearing and interconnecting and its parallel solution algorithm , 1991 .

[18]  Ian T. Foster,et al.  The Nexus Approach to Integrating Multithreading and Communication , 1996, J. Parallel Distributed Comput..

[19]  Charbel Farhat,et al.  A simple and efficient automatic fem domain decomposer , 1988 .

[20]  C. Farhat,et al.  Optimal convergence properties of the FETI domain decomposition method , 1994 .

[21]  Ian T. Foster,et al.  Managing Multiple Communication Methods in High-Performance Networked Computing Systems , 1997, J. Parallel Distributed Comput..

[22]  James C. French,et al.  Legion: The Next Logical Step Toward a Nationwide Virtual Computer , 1994 .