USING DATA MINING ALGORITHMS IN WEB PERFORMANCE PREDICTION

This paper presents the application of data mining algorithms to the prediction of Web performance. Our domain-driven data mining uses historic HTTP transactions data reflecting Web performance as perceived by the end-users located in the Internet domain of Wroclaw University of Technology, Wroclaw, Poland. The predictive modeling features of two general data mining systems, Microsoft SQL Server and IBM Intelligent Miner, are compared. The neural networks, decision tree, time series, and transform regression models are evaluated. It is shown that the data mining algorithms return quite accurate prediction results. The best results are achieved using the IBM's transform regression algorithm.

[1]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[2]  Leszek Borzemski,et al.  USING AUTONOMOUS SYSTEM TOPOLOGICAL INFORMATION IN A WEB SERVER PERFORMANCE PREDICTION , 2008, Cybern. Syst..

[3]  Leszek Borzemski,et al.  Application of data mining for the analysis of Internet path performance , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..

[4]  Peter A. Dinda,et al.  Characterizing and Predicting TCP Throughput on the Wide Area Network , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[5]  Edwin P. D. Pednault,et al.  Embedded predictive modeling in a parallel relational database , 2006, SAC.

[6]  Peter A. Dinda,et al.  An empirical study of the multiscale predictability of network traffic , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[7]  Leszek Borzemski,et al.  MWING: A Multiagent System for Web Site Measurements , 2007, KES-AMSTA.

[8]  Jaspal Subhlok,et al.  Fast pattern-based throughput prediction for TCP bulk transfers , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[9]  San-qi Li,et al.  A predictability analysis of network traffic , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[10]  kc claffy,et al.  Bandwidth estimation: metrics, measurement techniques, and tools , 2003, IEEE Netw..

[11]  Leszek Borzemski,et al.  An Empirical Study of Web Quality: Measuring the Web from Wroclaw University of Technology Campus , 2004, ICWE Workshops.

[12]  Leszek Borzemski,et al.  Internet Path Behavior Prediction via Data Mining: Conceptual Framework and Case Study , 2007, J. Univers. Comput. Sci..

[13]  Edwin P. D. Pednault,et al.  Transform Regression and the Kolmogorov Superposition Theorem , 2006, SDM.

[14]  Leszek Borzemski THE USE OF DATA MINING TO PREDICT WEB PERFORMANCE , 2006, Cybern. Syst..

[15]  Paul Barford,et al.  A Machine Learning Approach to TCP Throughput Prediction , 2007, IEEE/ACM Transactions on Networking.