A Novel Technique for Web Log mining with Better Data Cleaning and Transaction Identification

Problem statement: In the internet era web sites on the internet are useful source of information for almost every activity. So there is a rapid development of World Wide Web in its volume of traffic and the size and complexity of web sites. Web mining is the application of data mining, artificial intelligence, chart technology and so on to the web data and traces user’s visiting behaviors and extracts their interests using patterns. Because of its direct application in e-commerce, Web analytics, e-learning, information retrieval, web mining has become one of the important areas in computer and information science. There are several techniques like web usage mining exists. But all processes its own disadvantages. This study focuses on providing techniques for better data cleaning and transaction identification from the web log. Approach: Log data is usually noisy and ambiguous and preprocessing is an important process for efficient mining process. In the preprocessing, the data cleaning process includes removal of records of graphics, videos and the format information, the records with the failed HTTP status code and robots cleaning. Sessions are reconstructed and paths are completed by appending missing pages in preprocessing. And also the transactions which depict the behavior of users are constructed accurately in preprocessing by calculating the Reference Lengths of user access by considering byte rate. Results: When the number of records is considered, for example, for 1000 record, only 350 records are resulted using data cleaning. When the execution time is considered, the initial log take s119 seconds for execution, whereas, only 52 seconds are required by proposed technique. Conclusion: The experimental results show the performance of the proposed algorithm and comparatively it gives the good results for web usage mining compared to existing approaches.

[1]  H. Redkey,et al.  A new approach. , 1967, Rehabilitation record.

[2]  Yan Li,et al.  Research on Path Completion Technique in Web Usage Mining , 2008, 2008 International Symposium on Computer Science and Computational Technology.

[3]  Alfredo Petrosino,et al.  An Heuristic Approach to Page Recommendation in Web Usage Mining , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[4]  Jun Liu,et al.  Mining web log sequential patterns with layer coded breadth-first linked WAP-tree , 2009, 2009 ISECS International Colloquium on Computing, Communication, Control, and Management.

[5]  Demin Dong Exploration on Web Usage Mining and its Application , 2009, 2009 International Workshop on Intelligent Systems and Applications.

[6]  K. Thangavel,et al.  Rough Set Based Feature Selection for Web Usage Mining , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[7]  Chih-Hung Wu,et al.  Web usage mining on the sequences of clicking patterns in a grid computing environment , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[8]  Peiying Zhao,et al.  Web usage mining based on fuzzy clustering in identifying target group , 2009, 2009 ISECS International Colloquium on Computing, Communication, Control, and Management.

[9]  Antonio Badia,et al.  A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  Philip S. Yu,et al.  SpeedTracer: A Web Usage Mining and Analysis Tool , 1998, IBM Syst. J..

[11]  Marie-Jeanne Lesot,et al.  A New Web Usage Mining and Visualization Tool , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[12]  Bin Liu,et al.  Discovering Web usage patterns by mining cross-transaction association rules , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[13]  Yazid Mohd Saman,et al.  Using Metadata Analysis and Base Analysis Techniques in Data Qualities Framework for Data Warehouses , 2011 .

[14]  S.K. Shinde,et al.  A New Approach for on Line Recommender System in Web Usage Mining , 2008, 2008 International Conference on Advanced Computer Theory and Engineering.

[15]  Tasawar Hussain,et al.  Web usage mining: A survey on preprocessing of web log file , 2010, 2010 International Conference on Information and Emerging Technologies.

[16]  Fadhilah Mat Yamin,et al.  The Impact of User Knowledge on Web Search Satisfaction , 2011 .

[17]  Kobra Etminani,et al.  Web usage mining: Discovery of the users' navigational patterns using SOM , 2009, 2009 First International Conference on Networked Digital Technologies.

[18]  Chang-bin Jiang,et al.  Application of Cloud Model in Personalized Service Recommendation of Web Log Mining , 2010, 2010 International Conference on Biomedical Engineering and Computer Science.

[19]  Mehrdad Jalali,et al.  A Web Usage Mining Approach Based on LCS Algorithm in Online Predicting Recommendation Systems , 2008, 2008 12th International Conference Information Visualisation.

[20]  Mahmudur Rahman,et al.  Pattern Discovery of Web Usage Mining , 2009, 2009 International Conference on Computer Technology and Development.

[21]  Ying Wah Teh,et al.  Using Incremental Fuzzy Clustering to Web Usage Mining , 2009, 2009 International Conference of Soft Computing and Pattern Recognition.

[22]  Zhang Huiying,et al.  An intelligent algorithm of data pre-processing in Web usage mining , 2004, Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788).

[23]  Yu-Hsiang Fu,et al.  Web Usage Mining Based on Clustering of Browsing Features , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[24]  Ranieri Baraglia,et al.  SUGGEST: a Web usage mining system , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[25]  Surachai Panich The Shortest Path with Intelligent Algorithm , 2010 .

[26]  Sns Rajalakshmi,et al.  A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites , 2012 .