Data mining meets network analysis: Traffic prediction models

Most research on network traffic prediction has been done on small datasets based on statistical methodologies. This research analyzes an internet traffic dataset spanning multiple months using the data mining process. Each data mining phase was carefully fitted to the network analysis domain and systematized in context of data mining. The second part of the paper evaluates various seasonal time series prediction models (univariate), including ANN, ARIMA, Holt Winters etc., as a data mining phase on the given dataset. The experiments have shown that in most cases ANNs are superior to other algorithms for this purpose.

[1]  Helmut E. Bez,et al.  On the fractal characteristics of Internet network traffic and its utilization in covert communications , 2009, 2009 International Conference for Internet Technology and Secured Transactions, (ICITST).

[2]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[3]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[4]  Norio Shiratori,et al.  Self-similar and fractal nature of internet traffic , 2004, Int. J. Netw. Manag..

[5]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[6]  Pedro Sousa,et al.  Multi‐scale Internet traffic forecasting using neural networks and time series methods , 2010, Expert Syst. J. Knowl. Eng..

[7]  Norio Shiratori,et al.  Self-similar and fractal nature of internet traffic , 2004 .

[8]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1997, TNET.

[9]  Lin Li,et al.  A One-Step Network Traffic Prediction , 2008, ICIC.

[10]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  H. E. Hurst,et al.  Long-Term Storage Capacity of Reservoirs , 1951 .

[12]  K. Rasheed,et al.  HURST EXPONENT AND FINANCIAL MARKET PREDICTABILITY , 2005 .

[13]  Everette S. Gardner,et al.  Exponential smoothing: The state of the art , 1985 .

[14]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[15]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[16]  E. H. Lloyd,et al.  Long-Term Storage: An Experimental Study. , 1966 .

[17]  V. B. Dharmadhikari,et al.  An NN approach for MPEG video traffic prediction , 2010, 2010 2nd International Conference on Software Technology and Engineering.

[18]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[19]  ShimKyuseok,et al.  Efficient algorithms for mining outliers from large data sets , 2000 .

[20]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[21]  Anja Feldmann,et al.  Data networks as cascades: investigating the multifractal nature of Internet WAN traffic , 1998, SIGCOMM '98.

[22]  Chonggun Kim,et al.  A Prediction Method of Network Traffic Using Time Series Models , 2006, ICCSA.

[23]  Anupam Joshi,et al.  On Using a Warehouse to Analyze Web Logs , 2003, Distributed and Parallel Databases.

[24]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[25]  Guoqiang Peter Zhang,et al.  Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[26]  Xingwei Liu,et al.  SVM-based analysis and prediction on network traffic , 2007 .