A Comprehensive Survey on Data Preprocessing Methods in Web Usage Minning

Web usage mining is the application of data mining technique which is used to extract information about user’s interest from web server log files. Web usage mining is widely used by companies to analyze the customer’s interest and predict future of their business. It is used in various fields like E-Business, E-Commerce, Elearning, etc., Web usage mining entails of three phases :Data Preprocessing , Pattern Discovery and Pattern analysis. Data Preprocessing is one of the essential and a preliminary step in web mining to enforce quality in the input data. The raw data from web server log file is preprocessed to eliminate the noisy, vague and redundant data for efficient mining. It involves different phases namely Field Extraction and Data cleaning, User Identification, Session Identification, Path completion and Transaction Identification. In this paper, we have discussed about various researches carried out in Data Cleaning and the various attributes considered in the process of cleaning. Keywords—Web usage mining, Data Preprocessing , Session Identification, Path Completion, User Identification

[1]  K. Sudheer Reddy,et al.  An effective data preprocessing method for Web Usage Mining , 2013, 2013 International Conference on Information Communication and Embedded Systems (ICICES).

[2]  H. S. Guruprasad,et al.  A New Web Usage Mining Approach for Website Recommendations Using Concept Hierarchy and Website Graph , 2014 .

[3]  Theint Theint Aye,et al.  Web log cleaning for mining of web usage patterns , 2011, 2011 3rd International Conference on Computer Research and Development.

[4]  Bhavani M. Thuraisingham,et al.  Web Data Mining and Applications in Business Intelligence and Counter-Terrorism , 2003 .

[5]  Zuraini Ismail,et al.  Enhanced Web Log Cleaning Algorithm for Web Intrusion Detection , 2014, IC2IT.

[6]  Yiqun Liu,et al.  Data cleansing for Web information retrieval using query independent features , 2007, J. Assoc. Inf. Sci. Technol..

[7]  Abdelaziz Marzak,et al.  Web Usage Mining data preprocessing and multi level analysis on Moodle , 2013, 2013 ACS International Conference on Computer Systems and Applications (AICCSA).

[8]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[9]  Darshak B. Mehta,et al.  Web Usage Mining Using Association Rule Mining on Clustered Data for Pattern Discovery , 2013 .

[10]  Addanki Ramya,et al.  Preprocessing and Unsupervised Approach For Web Usage Mining , 2012 .

[11]  Jaideep Srivastava,et al.  A Novel Technique for Sessions Identification in Web Usage Mining Preprocessing , 2011 .

[12]  Franco Turini,et al.  Preprocessing and Mining Web Log Data for Web Personalization , 2003, AI*IA.

[13]  P. K. Mishra,et al.  Analysis of Data Extraction and Data Cleaning in Web Usage Mining , 2015, ICARCSET '15.

[14]  Rinkle Rani Aggarwal,et al.  An Efficient Algorithm for Data Cleaning of Log File using File Extensions , 2012 .

[15]  Pankaj M. Meshram,et al.  Mining of Web Logs Using Preprocessing andClustering , 2014 .

[16]  Shashi Sahu A Survey on Frequent Web Page Mining with Improving Data Quality of Log Cleaner , 2015 .