AN OVERVIEW OF PREPROCESSING OF WEB LOG FILES FOR WEB USAGE MINING

With the Internet usage gaining popularity and the steady growth of users, the World Wide Web has become a huge repository of data and serves as an important platform for the dissemination of information. The users’ accesses to Web sites are stored in Web server logs. However, the data stored in the log files do not present an accurate picture of the users’ accesses to the Web site. Hence, preprocessing of the Web log data is an essential and pre-requisite phase before it can be used for knowledge-discovery or mining tasks. The preprocessed Web data can then be suitable for the discovery and analysis of useful information referred to as Web mining. Web usage mining, a classification of Web mining, is the application of data mining techniques to discover usage patterns from clickstream and associated data stored in one or more Web servers. This paper presents an overview of the various steps involved in the preprocessing stage.

[1]  Wahyu Kusuma,et al.  Journal of Theoretical and Applied Information Technology , 2012 .

[2]  Bamshad Mobasher,et al.  Web Usage Mining and Personalization , 2004, The Practical Handbook of Internet Computing.

[3]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[4]  Sanjay Tyagi,et al.  AN ALGORITHMIC APPROACH TO DATA PREPROCESSING IN WEB USAGE MINING , 2010 .

[5]  Z. Pabarskaite Implementing advanced cleaning and end-user interpretability technologies in Web log mining , 2002, ITI 2002. Proceedings of the 24th International Conference on Information Technology Interfaces (IEEE Cat. No.02EX534).

[6]  James E. Pitkow,et al.  In Search of Reliable Usage Data on the WWW , 1997, Comput. Networks.

[7]  Gillian Dobbie,et al.  Particle Swarm Optimization Based Clustering of Web Usage Data , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[8]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[9]  Mathias Géry,et al.  Evaluation of web usage mining approaches for user's next request prediction , 2003, WIDM '03.

[10]  Joshua Zhexue Huang,et al.  A data warehousing and data mining framework for web usage management , 2004, Commun. Inf. Syst..

[11]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[12]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[13]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  Michael K. Ng,et al.  A Data Cube Model for Prediction-Based Web Prefetching , 2004, Journal of Intelligent Information Systems.

[16]  Mehrdad Jalali,et al.  Expectation maximization clustering algorithm for user modeling in web usage mining system , 2009 .

[17]  Chaofeng Li Research on Web Session Clustering , 2009, J. Softw..

[18]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..

[19]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[20]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[21]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.