Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method

Abstract The major problem of many on-line web sites is the presentation of many choices to the client at a time; this usually results to strenuous and time consuming task in finding the right product or information on the site. In this work, we present a study of automatic web usage data mining and recommendation system based on current user behavior through his/her click stream data on the newly developed Really Simple Syndication (RSS) reader website, in order to provide relevant information to the individual without explicitly asking for it. The K-Nearest-Neighbor (KNN) classification method has been trained to be used on-line and in Real-Time to identify clients/visitors click stream data, matching it to a particular user group and recommend a tailored browsing option that meet the need of the specific user at a particular time. To achieve this, web users RSS address file was extracted, cleansed, formatted and grouped into meaningful session and data mart was developed. Our result shows that the K-Nearest Neighbor classifier is transparent, consistent, straightforward, simple to understand, high tendency to possess desirable qualities and easy to implement than most other machine learning techniques specifically when there is little or no prior knowledge about data distribution.

[1]  Javier Taboada,et al.  Explaining and predicting workplace accidents using data-mining techniques , 2011, Reliab. Eng. Syst. Saf..

[2]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[3]  Peter Svec,et al.  Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor , 2010, ICCS.

[4]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[5]  Rajanish Dass,et al.  Mining Frequent Item sets in Data Streams , 2008 .

[6]  V. Bhuvaneswari,et al.  K nearest neighbor classifier over secured perturbed data , 2016, 2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave).

[7]  Zdravko Markov,et al.  Data mining the web - uncovering patterns in web content, structure, and usage , 2007 .

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Elena Baralis,et al.  Analysis of diabetic patients through their examination history , 2013, Expert Syst. Appl..

[10]  Pier Luca Lanzi,et al.  Mining interesting knowledge from weblogs: a survey , 2005, Data Knowl. Eng..

[11]  Dong Hoon Lee,et al.  Data-mining based SQL injection attack detection using internal query trees , 2014, Expert Syst. Appl..

[12]  Davis,et al.  Principles of Data Mining , 2001 .

[13]  Dino Pedreschi,et al.  Web log data warehousing and mining for intelligent web caching , 2001, Data Knowl. Eng..

[14]  Amit Kumar,et al.  Analysis the effect of data mining techniques on database , 2012, Adv. Eng. Softw..

[15]  Michael Weeks,et al.  Introduction to MATLAB and SIMULINK: A Project Approach , 2007 .

[16]  Haibin Liu,et al.  Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users' future requests , 2007, Data Knowl. Eng..

[17]  Amartya Singh,et al.  Application of data mining techniques in bioinformatics , 2007 .

[18]  Paul Horton,et al.  Better Prediction of Protein Cellular Localization Sites with the it k Nearest Neighbors Classifier , 1997, ISMB.

[19]  Jaideep Srivastava,et al.  Discovery of Interesting Usage Patterns from Web Data , 1999, WEBKDD.

[20]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[21]  Leif E. Peterson K-nearest neighbor , 2009, Scholarpedia.

[22]  David F. Nettleton,et al.  Data mining of social networks represented as graphs , 2013, Comput. Sci. Rev..

[23]  Zdravko Markov,et al.  Data mining the Web , 2007 .

[24]  John Edwards,et al.  Personalised online sales using web usage data mining , 2007, Comput. Ind..

[25]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[26]  Luca Cagliero,et al.  Improving classification models with taxonomy information , 2013, Data Knowl. Eng..

[27]  Shu-Hsien Liao,et al.  Data mining techniques and applications - A decade review from 2000 to 2011 , 2012, Expert Syst. Appl..

[28]  Ibrahim Türkoglu,et al.  Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method , 2009, Expert Syst. Appl..

[29]  Wang Guohua,et al.  Data Mining: Concept, Aplications and Techniques , 2017 .

[30]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.