Adaptive Performance Prediction for Distributed Data-Intensive Applications

The computational grid is becoming the platform of choice for large-scale distributed data-intensive applications. Accurately predicting the transfer times of remote data files, a fundamental component of such applications, is critical to achieving application performance. In this paper, we introduce a performance prediction method, AdRM (Adaptive Regression Modeling), to determine file transfer times for network-bound distributed data-intensive applications. We demonstrate the effectiveness of the AdRM method on two distributed data applications, SARA (Synthetic Aperture Radar Atlas) and SRB (Storage Resource Broker), and discuss how it can be used for application scheduling. Our experiments use the Network Weather Service [36, 37], a resource performance measurement and forecasting facility, as a basis for the performance prediction model. Our initial findings indicate that the AdRM method can be effective in accurately predicting data transfer times in wide-area multi-user grid environments.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[3]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[4]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[5]  A. L. Edwards,et al.  An introduction to linear regression and correlation. , 1985 .

[6]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[7]  R. A. Coyne,et al.  The high performance storage system , 1993, Supercomputing '93.

[8]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[9]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[10]  Ramasamy Uthurusamy,et al.  Data mining and knowledge discovery in databases , 1996, CACM.

[11]  Mark Crovella,et al.  Dynamic Server Selection using Bandwidth Probing in Wide-Area Networks , 1996 .

[12]  Amarnath Mukherjee,et al.  Time series models for internet traffic , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[13]  Thomas R. Gross,et al.  ReMoS: A Resource Monitoring System for Network-Aware Applications , 1997 .

[14]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[15]  Neil Spring,et al.  Application level scheduling of gene sequence comparison on metacomputers , 1998 .

[16]  Oscar H. Ibarra,et al.  Adaptive Partitioning and Scheduling for Enhancing WWW Application Performance , 1998, J. Parallel Distributed Comput..

[17]  Keith Marzullo,et al.  Wide-area Nile: a case study of a wide-area data-parallel application , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[18]  Rajeev Thakur,et al.  A Case for Using MPI's Derived Datatypes to Improve I/O Performance , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[19]  Bruce R. Schatz High-performance distributed digital libraries: building the Interspace on the grid , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[20]  Joel H. Saltz,et al.  A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines , 1998, LCR.

[21]  Francine Berman,et al.  Performance prediction in production environments , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[22]  Jennifer M. Schopf Performance prediction and scheduling for parallel applications on multi-user clusters , 1998 .

[23]  William E. Johnston,et al.  The NetLogger methodology for high performance distributed systems performance analysis , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[24]  Francine Berman,et al.  Using Apples to Schedule Simple SARA on the Computational Grid , 1999, Int. J. High Perform. Comput. Appl..

[25]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[26]  Thomas R. Gross,et al.  Direct queries for discovering network resource properties in a distributed environment , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[27]  Peter A. Dinda,et al.  An evaluation of linear models for host load prediction , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[28]  Francine Berman,et al.  Application Scheduling on the Information Power Grid , 2000 .

[29]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[30]  Elizabeth A. Peck,et al.  Introduction to Linear Regression Analysis , 2001 .

[31]  Terri L. Moore,et al.  Regression Analysis by Example , 2001, Technometrics.