A k-Nearest Neighbors Method for Classifying User Sessions in E-Commerce Scenario

This paper addresses the problem of classification of user sessions in an online store into two classes: buying sessions (during which a purchase confirmation occurs) and browsing sessions. As interactions connected with a purchase confirmation are typically completed at the end of user sessions, some information describing active sessions may be observed and used to assess the probability of making a purchase. The authors formulate the problem of predicting buying sessions in a Web store as a supervised classification problem where there are two target classes, connected with the fact of finalizing a purchase transaction in session or not, and a feature vector containing some variables describing user sessions. The presented approach uses the k-Nearest Neighbors (k-NN) classification. Based on historical data obtained from online bookstore log files a k-NN classifier was built and its efficiency was verified for different neighborhood sizes. A 11-NN classifier was the most effective both in terms of buying session predictions and overall predictions, achieving sensitivity of 87.5% and accuracy of 99.85%. Keywords—data mining, e-commerce, k-Nearest Neighbors, k-NN, log file analysis, online store, R-project, supervised classification, Web mining, Web store, Web traffic, Web usage mining.

[1]  Yoon Ho Cho,et al.  Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce , 2004, Expert Syst. Appl..

[2]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[3]  Grazyna Suchacka,et al.  Classification Of E-Customer Sessions Based On Support Vector Machine , 2015, ECMS.

[4]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[5]  Aijun An,et al.  Unsupervised Clustering of Web Sessions to Detect Malicious and Non-malicious Website Users , 2011, ANT/MobiWIS.

[6]  Grzegorz Chodak,et al.  Practical Aspects of Log File Analysis for E-Commerce , 2013, CN.

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Yu Wang,et al.  The Application of Fuzzy Association Rule Mining in E-Commerce Information System Mining , 2012 .

[9]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ming-Syan Chen,et al.  Mining Web Transaction Patterns in an Electronic Commerce Environment , 2000, PAKDD.

[11]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[12]  Dan Shen,et al.  Large-scale item categorization for e-commerce , 2012, CIKM.

[13]  Xue-Mei Jiang,et al.  Optimizing Collaborative Filtering by Interpolating the Individual and Group Behaviors , 2006, APWeb.

[14]  Jordi Torres,et al.  Web Customer Modeling for Automated Session Prioritization on High Traffic Sites , 2007, User Modeling.

[15]  Guofang Kuang,et al.  Using Fuzzy Association Rules to Design E-commerce Personalized Recommendation System , 2014 .

[16]  Mick J. Ridley,et al.  Promoting where, when and what? An analysis of web logs by integrating data mining and social network techniques to guide ecommerce business promotions , 2010, Social Network Analysis and Mining.

[17]  Yongtae Park,et al.  Collaborative Filtering Using Dual Information Sources , 2007, IEEE Intelligent Systems.

[18]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[19]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[20]  Choochart Haruechaiyasak,et al.  E-Commerce Web Site Trust Assessment Based on Text Analysis , 2008 .

[21]  Zhixiang Chen,et al.  Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs , 2002, PAKDD.

[22]  Masoumeh Mohammadnezhad,et al.  Providing A Model For Predicting Tour Sale In Mobile E-Tourism Recommender Systems , 2012 .

[23]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .