An effective data preprocessing method for Web Usage Mining

Web Usage Mining (WUM) is one of the categories of data mining technique that identifies usage patterns of the web data, so as to perceive and better serve the requirements of the web applications. The working of WUM involves three steps - preprocessing, pattern discovery and analysis. The first step in WUM - Preprocessing of data is an essential activity which will help to improve the quality of the data and successively the mining results. This research paper studies and presents several data preparation techniques of access stream even before the mining process can be started and these are used to improve the performance of the data preprocessing to identify the unique sessions and unique users. The methods proposed will help to discover meaningful pattern and relationships from the access stream of the user and these are proved to be valid and useful by various research tests. The paper is concluded by proposing the future research directions in this space.

[1]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[2]  James E. Pitkow,et al.  Characterizing Browsing Behaviors on the World-Wide Web , 1995 .

[3]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[4]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[5]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[6]  Feng Zhang,et al.  Research and development in Web usage mining system-key issues and proposed solutions: a survey , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[7]  Tao Luo,et al.  Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization , 2004, Data Mining and Knowledge Discovery.

[8]  Brigitte Trousse,et al.  Advanced data preprocessing for intersites Web usage mining , 2004, IEEE Intelligent Systems.

[9]  Mohd Norzali Haji Mohd,et al.  Data pre-processing on web server logs for generalized association rules mining algorithm , 2008 .

[10]  G T Raju,et al.  Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology , 2008 .

[11]  R. Krishnamoorthi,et al.  Identifying User Behavior by Analyzing Web Server Access Log File , 2009 .

[12]  Bamshad Mobasher,et al.  Integrating Semantic Knowledge with Web Usage Mining for Personalization , 2009 .

[13]  Ravi Sundaram,et al.  Preprocessing DNS Log Data for Effective Data Mining , 2009, 2009 IEEE International Conference on Communications.