Data Preprocessing on Web Server Log Files for Mining Users Access Patterns

Web Usage Mining (WUM) is the application of data mining techniques to discover the knowledge hidden in the web log file, such as user access patterns from web data and for analyzing users’ behavioral patterns. Thewebsite may likewise be accessed for various website design tasks. Nonetheless, the data stored in the web log file has a large amount of erroneous, misleading, and incomplete information. Preprocessing which is one of the important phases in WUM is needed to transform a log into a set of web user sessions that are suitable for analyses. A sample web log file was collected from the web server at NASA Kennedy Space Center. This study focuses on the preprocessing of the web log file methods that can be used for the task of session identification from web log file. The work in this study also produces statistical information of user session, such as: (1) total unique IPs; (2) total unique pages; (3) total sessions; (4) Session length and (5) the frequency visited pages. After preprocessing completed, the result will be used for mining user access patterns.

[1]  Wahyu Kusuma,et al.  Journal of Theoretical and Applied Information Technology , 2012 .

[2]  Naomie Salim,et al.  Predicting next page access by Markov models and association rules on web log data , 2006 .

[3]  Theint Theint Aye,et al.  Web log cleaning for mining of web usage patterns , 2011, 2011 3rd International Conference on Computer Research and Development.

[4]  V. Sathiyamoorthi,et al.  Data Preprocessing Techniques for Pre-Fetching and Caching of Web Data through Proxy Server , 2011 .

[5]  Dhinaharan Nagamalai,et al.  Analysis of Web Logs and Web User in Web Mining , 2011, ArXiv.

[6]  Mohamed I. Marie,et al.  Web Server Logs Preprocessing for Web Intrusion Detection , 2011, Comput. Inf. Sci..

[7]  R. Krishnamoorthi,et al.  Identifying User Behavior by Analyzing Web Server Access Log File , 2009 .

[8]  Myra Spiliopoulou,et al.  The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis , 2002, WEBKDD.

[9]  Mohd Norzali Haji Mohd,et al.  Data pre-processing on web server logs for generalized association rules mining algorithm , 2008 .

[10]  V. Sathiyamoorthi,et al.  Data Preparation Techniques for Web Usage Mining in World Wide Web-An Approach , 2009 .

[11]  V. V. R. Maheswara Rao,et al.  An Enhanced Pre-processing Research Framework for Web Log Data Using a Learning Algorithm , 2011 .

[12]  V. Chitraa,et al.  A Survey on Preprocessing Methods for Web Usage Data , 2010, ArXiv.

[13]  G. Kavitha,et al.  An Efficient Preprocessing Methodology for Discovering Patterns and Clustering of Web Users using a Dynamic ART1 Neural Network , 2011, ArXiv.

[14]  Bharati Vidyapeeth,et al.  A Effective and Complete Preprocessing for Web Usage Mining , 2010 .

[15]  Rahul Nayak,et al.  Web Usage Mining by Data Preprocessing , 2012 .

[16]  Ge Yu,et al.  Study on data preprocessing algorithm in Web log mining , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[17]  Jaideep Srivastava,et al.  Web usage mining: discovery and application of interesting patterns from web data , 2000 .

[18]  Zdravko Markov,et al.  Preprocessing for Web Usage Mining , 2006 .