Network bandwidth utilization forecast model on high bandwidth networks

With the increasing number of geographically distributed scientific collaborations and the growing sizes of scientific data, it has become challenging for users to achieve the best possible network performance on a shared network. We have developed a model to forecast expected bandwidth utilization on high-bandwidth wide area networks. The forecast model can improve the efficiency of resource utilization and scheduling of data movements on high-bandwidth networks to accommodate ever increasing data volume for large-scale scientific data applications. A univariate forecast model is developed with STL and ARIMA on SNMP path utilization data. Compared with traditional approach such as Box-Jenkins methodology to train the ARIMA model, our forecast model reduces computation time by 83.2%. It also shows resilience against abrupt network usage changes. Its forecast errors are within the standard deviation of the monitored measurements.

[1]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[2]  Irma J. Terpenning,et al.  STL : A Seasonal-Trend Decomposition Procedure Based on Loess , 1990 .

[3]  Richard A. Davis,et al.  Time Series: Theory and Methods (2nd ed.). , 1992 .

[4]  Rob J Hyndman,et al.  A state space framework for automatic forecasting using exponential smoothing methods , 2002 .

[5]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[6]  Richard G. Baraniuk,et al.  pathChirp: Efficient available bandwidth estimation for network paths , 2003 .

[7]  Ronald K. Pearson Data cleaning for dynamic modeling and control , 1999, 1999 European Control Conference (ECC).

[8]  Richard A. Davis,et al.  Time Series: Theory and Methods (2Nd Edn) , 1993 .

[9]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[10]  Qi He,et al.  On the predictability of large transfer TCP throughput , 2005, SIGCOMM '05.

[11]  Robert Tibshirani,et al.  Computer‐Intensive Statistical Methods , 2006 .

[12]  Alok Shriram,et al.  Empirical Evaluation of Techniques for Measuring Available Bandwidth , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[13]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[14]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[15]  Paul Barford,et al.  A Machine Learning Approach to TCP Throughput Prediction , 2007, IEEE/ACM Transactions on Networking.

[16]  Antonio Pescapè,et al.  Unified architecture for network measurement: The case of available bandwidth , 2012, J. Netw. Comput. Appl..

[17]  Arie Shoshani,et al.  A Flexible Reservation Algorithm for Advance Network Provisioning , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Konstantina Papagiannaki,et al.  Long-term forecasting of Internet backbone traffic , 2005, IEEE Transactions on Neural Networks.

[19]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[20]  Peter Steenkiste,et al.  Evaluation and characterization of available bandwidth probing techniques , 2003, IEEE J. Sel. Areas Commun..

[21]  JainManish,et al.  End-to-end available bandwidth , 2002 .

[22]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[23]  M. Frans Kaashoek,et al.  A measurement study of available bandwidth estimation tools , 2003, IMC '03.

[24]  G. Box,et al.  On a measure of lack of fit in time series models , 1978 .

[25]  Peter A. Dinda,et al.  Characterizing and Predicting TCP Throughput on the Wide Area Network , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[26]  Tevfik Kosar,et al.  A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing , 2011, IEEE Trans. Parallel Distributed Syst..

[27]  Marco Mellia,et al.  The quest for bandwidth estimation techniques for large-scale distributed systems , 2010, PERV.

[28]  H. Akaike A new look at the statistical model identification , 1974 .

[29]  Peter A. Dinda,et al.  An empirical study of the multiscale predictability of network traffic , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[30]  P. Phillips,et al.  Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? , 1992 .

[31]  K. Hadri Testing The Null Hypothesis Of Stationarity Against The Alternative Of A Unit Root In Panel Data With Serially Correlated Errors , 1999 .

[32]  San-qi Li,et al.  A predictability analysis of network traffic , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[33]  Yong Zeng,et al.  ARCH-Based Traffic Forecasting and Dynamic Bandwidth Provisioning for Periodically Measured Nonstationary Traffic , 2007, IEEE/ACM Transactions on Networking.

[34]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[35]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[36]  Tevfik Kosar,et al.  A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[37]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[38]  J. S. Urban Hjorth,et al.  Computer Intensive Statistical Methods: Validation, Model Selection, and Bootstrap , 1993 .

[39]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.