A Survey on Data Preprocessing in Web Usage Mining

With the abundant use of Internet and constant growth of users, the World Wide Web has a huge storage of data and these data serves as an important medium for the getting information of the users access to web sites which are data stored in Web server Logs. Today people are interested in analyzing logs file as they show actual usage of web site. But the data is not accurate so preprocessing of Web log files are essential then after that data are suitable for knowledge discovery or mining tasks. Web Usage Mining, a part of Web mining and application of data mining is used for automatic discovery of patterns in clickstreams and associated data collected or generated as a result of user interactions with one or more Web Sites. This survey paper gives the literature review and also overview of various steps needed for preprocessing phase. Keywords - Data Fusion, Path Completion, Pre processing, Session Identification, Web usage, Web Server Log file.

[1]  Mehmed Kantardzic,et al.  Data-Mining Concepts , 2011 .

[2]  G T Raju,et al.  Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology , 2008 .

[3]  R. Krishnamoorthi,et al.  Identifying User Behavior by Analyzing Web Server Access Log File , 2009 .

[4]  Z. Pabarskaite Implementing advanced cleaning and end-user interpretability technologies in Web log mining , 2002, ITI 2002. Proceedings of the 24th International Conference on Information Technology Interfaces (IEEE Cat. No.02EX534).

[5]  Demin Dong Exploration on Web Usage Mining and its Application , 2009, 2009 International Workshop on Intelligent Systems and Applications.

[6]  Gillian Dobbie,et al.  Particle Swarm Optimization Based Clustering of Web Usage Data , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[7]  Xiang-ying Li Data Preprocessing in Web Usage Mining , 2013 .

[8]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[9]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[10]  Ge Yu,et al.  Study on data preprocessing algorithm in Web log mining , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[11]  V. Chitraa,et al.  A Survey on Preprocessing Methods for Web Usage Data , 2010, ArXiv.

[12]  T. Santhanam,et al.  AN OVERVIEW OF PREPROCESSING OF WEB LOG FILES FOR WEB USAGE MINING , 2011 .

[13]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[14]  Mohd Norzali Haji Mohd,et al.  Data pre-processing on web server logs for generalized association rules mining algorithm , 2008 .