Towards Web Performance Mining

Web Mining is the application of data mining to discover useful knowledge from the Web. Web mining focuses now on four main research directions related to the categories of Web data: Web content mining, Web usage mining, Web structure mining, and Web user profile mining. Web content mining discovers what Web pages are about and reveals new knowledge from them. Web usage mining concerns the identification of patterns in user navigation through Web pages and is performed for the reasons of service personalization, system improvement, and usage characterization. Web structure mining investigates how the Web documents are structured, and discovers the model underlying the link structures of WWW. Web user profile mining discovers user’s profiles based on users’ behavior on the Web. We present the application of data mining in Web performance analysis. We call our approach Web performance mining (WPM). It has been defined to characterize the performance from the perspective of Web clients in the sense of the data transfer throughput in Web transactions. WPM adds a new dimension in Web mining research that focuses on using data mining techniques to analyze Web performance measurements to find interesting patterns in order to support decision-making in the use of Web, for example, to predict future state of good or poor performance in the access to particular Web servers. WPM is based on the measurements which are planned and performed using specific measurement tools and platforms. We developed the multi-agent distributed system MWING to support required active measurements.

[1]  A. Tsymbal,et al.  Knowledge Management Challenges in Knowledge Discovery Systems , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[2]  Leszek Borzemski,et al.  An Empirical Study of Web Quality: Measuring the Web from Wroclaw University of Technology Campus , 2004, ICWE Workshops.

[3]  Norio Shiratori,et al.  An engineering approach to dynamic prediction of network performance from application logs , 2005 .

[4]  Leszek Borzemski,et al.  Lessons from the Application of Domain-Independent Data Mining System for Discovering Web User Access Patterns , 2006, KES.

[5]  Qi He,et al.  On the predictability of large transfer TCP throughput , 2005, SIGCOMM '05.

[6]  Douglas E. Comer Computer networks and lnternets , 1996 .

[7]  Leszek Borzemski,et al.  MWING: A Multiagent System for Web Site Measurements , 2007, KES-AMSTA.

[8]  Ngoc Thanh Nguyen,et al.  New Frontiers in Applied Artificial Intelligence, 21st International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2008, Wroclaw, Poland, June 18-20, 2008, Proceedings , 2008, International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems.

[9]  Pier Luca Lanzi,et al.  Mining interesting knowledge from weblogs: a survey , 2005, Data Knowl. Eng..

[10]  Chengqi Zhang,et al.  Data preparation for data mining , 2003, Appl. Artif. Intell..

[11]  Michele Colajanni,et al.  A client-aware dispatching algorithm for web clusters providing multiple services , 2001, WWW '01.

[12]  Nick Duffield,et al.  Sampling for Passive Internet Measurement: A Review , 2004 .

[13]  Richard Wolski,et al.  Multivariate Resource Performance Forecasting in the Network Weather Service , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[14]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[15]  Richard Wolski,et al.  Dynamically forecasting network performance using the Network Weather Service , 1998, Cluster Computing.

[16]  Moonis Ali,et al.  Innovations in Applied Artificial Intelligence , 2005 .

[17]  Rajkumar Buyya,et al.  Grids and Grid technologies for wide‐area distributed computing , 2002, Softw. Pract. Exp..

[18]  kc claffy,et al.  Methodology for passive analysis of a university Internet link , 2001 .

[19]  Dimitrios Kalles,et al.  PLAYER CO-MODELLING IN A STRATEGY BOARD GAME: DISCOVERING HOW TO PLAY FAST , 2006, Cybern. Syst..

[20]  Leszek Borzemski,et al.  WING: A Web Probing, Visualization, and Performance Analysis Service , 2004, ICWE.

[21]  Leszek Borzemski,et al.  Application of Data Mining Algorithms to TCP throughput Prediction in HTTP Transactions , 2008, IEA/AIE.

[22]  Janusz Kacprzyk,et al.  Advances in Web Intelligence , 2003, Lecture Notes in Computer Science.

[23]  Jaideep Srivastava,et al.  Web Mining , 2004, Data Mining and Knowledge Discovery.

[24]  Anna Zatwarnicka,et al.  ADAPTIVE AND INTELLIGENT REQUEST DISTRIBUTION FOR CONTENT DELIVERY NETWORKS , 2007, Cybern. Syst..

[25]  Leszek Borzemski,et al.  USING AUTONOMOUS SYSTEM TOPOLOGICAL INFORMATION IN A WEB SERVER PERFORMANCE PREDICTION , 2008, Cybern. Syst..

[26]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[27]  RadhaKanta Mahapatra,et al.  Business data mining - a machine learning perspective , 2001, Inf. Manag..

[28]  David Watson,et al.  An extensible probe architecture for network protocol performance measurement , 2004, Softw. Pract. Exp..

[29]  Leszek Borzemski THE USE OF DATA MINING TO PREDICT WEB PERFORMANCE , 2006, Cybern. Syst..

[30]  kc claffy,et al.  Bandwidth estimation: metrics, measurement techniques, and tools , 2003, IEEE Netw..

[31]  Edwin P. D. Pednault,et al.  Transform Regression and the Kolmogorov Superposition Theorem , 2006, SDM.

[32]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[33]  Nevil Brownlee,et al.  Internet Measurement , 2004, IEEE Internet Comput..

[34]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[35]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[36]  Dmitri V. Krioukov,et al.  Revealing the Autonomous System Taxonomy: The Machine Learning Approach , 2006, ArXiv.

[37]  Balachander Krishnamurthy,et al.  Predicting short-transfer latency from TCP arcana: a trace-based validation , 2005, IMC '05.

[38]  Gustavo Rossi,et al.  Web Engineering , 2001, Lecture Notes in Computer Science.

[39]  Xiaozhe Wang,et al.  Intelligent web traffic mining and analysis , 2005, J. Netw. Comput. Appl..

[40]  Philip S. Yu,et al.  The state of the art in locally distributed Web-server systems , 2002, CSUR.

[41]  R. B. Mishra,et al.  Intelligent Web Mining Model to Enhance Knowledge Discovery on the Web , 2006, 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06).

[42]  Krzysztof Zatwarnicki,et al.  Using Adaptive Fuzzy-Neural Control to Minimize Response Time in Cluster-Based Web Systems , 2005, AWIC.

[43]  Yoon Ho Cho,et al.  A personalized recommender system based on web usage mining and decision tree induction , 2002, Expert Syst. Appl..

[44]  Leszek Borzemski Data mining in evaluation of internet path performance , 2004 .

[45]  Farnam Jahanian,et al.  An extensible probe architecture for network protocol performance measurement , 1998, SIGCOMM '98.

[46]  Matthew Roughan,et al.  A Comparison of Poisson and Uniform Sampling for Active Measurements , 2006, IEEE Journal on Selected Areas in Communications.

[47]  Leszek Borzemski Testing, Measuring, and Diagnosing Web Sites from the Users' Perspective , 2006, Int. J. Enterp. Inf. Syst..

[48]  Kirk L. Johnson,et al.  The measured performance of content distribution networks , 2001, Comput. Commun..

[49]  Leszek Borzemski,et al.  Internet Path Behavior Prediction via Data Mining: Conceptual Framework and Case Study , 2007, J. Univers. Comput. Sci..

[50]  John M. Ward,et al.  Enterprise Portals: Addressing the Organisational and Individual Perspectives of Information Systems , 2005, ECIS.

[51]  Yin Zhang,et al.  On the constancy of internet path properties , 2001, IMW '01.

[52]  Yoon Ho Cho,et al.  A personalized recommendation procedure for Internet shopping support , 2002, Electron. Commer. Res. Appl..