Analysis of aggregated bot and human traffic on e-commerce site

A significant volume of Web traffic nowadays can be attributed to robots. Although some of them, e.g., search-engine crawlers, perform useful tasks on a website, others may be malicious and should be banned. Consequently, there is a growing need to identify bots and to characterize their behavior. This paper investigates the share of bot-generated traffic on an e-commerce site and studies differences in bots' and humans' session-based traffic by analyzing data recorded in Web server log files. Results show that both kinds of sessions reveal different characteristics, including the session duration, the number of pages visited in session, the number of requests, the volume of data transferred, the mean time per page, the number of images per page, and the percentage of pages with unassigned referrers.

[1]  Swapna S. Gokhale,et al.  Searching for Heavy Tails in Web Robot Traffic , 2010, 2010 Seventh International Conference on the Quantitative Evaluation of Systems.

[2]  Lars Schmidt-Thieme,et al.  Web Robot Detection - Preprocessing Web Logfiles for Robot Detection , 2005 .

[3]  Sungdeok Cha,et al.  Web Robot Detection based on Monotonous Behavior , 2012 .

[4]  Zhenyu Wu,et al.  Humans and Bots in Internet Chat: Measurement, Analysis, and Automated Classification , 2011, IEEE/ACM Transactions on Networking.

[5]  Marios D. Dikaiakos,et al.  An investigation of web crawler behavior: characterization and metrics , 2005, Comput. Commun..

[6]  Alex Talevski,et al.  Web Spambot Detection Based on Web Navigation Behaviour , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[7]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[8]  Hongwen Kang,et al.  Large-scale bot detection for search engines , 2010, WWW '10.

[9]  Marios D. Dikaiakos,et al.  Web robot detection: A probabilistic reasoning approach , 2009, Comput. Networks.

[10]  R. Gavaldà,et al.  Automatic Detection and Banning of Content Stealing Bots for E-commerce , 2007 .

[11]  Swapna S. Gokhale,et al.  Long Range Dependence (LRD) in the Arrival Process of Web Robots , 2012 .

[12]  Virgílio A. F. Almeida,et al.  Analyzing robot behavior in e-business sites , 2001, SIGMETRICS '01.

[13]  Marios D. Dikaiakos,et al.  Real-time web crawler detection , 2011, 2011 18th International Conference on Telecommunications.

[14]  Aijun An,et al.  Unsupervised Clustering of Web Sessions to Detect Malicious and Non-malicious Website Users , 2011, ANT/MobiWIS.

[15]  Ah Reum Kang,et al.  Chatting Pattern Based Game BOT Detection: Do They Talk Like Us? , 2012, KSII Trans. Internet Inf. Syst..

[16]  Marcel Hebing,et al.  Identifying Artificial Actors in E-Dating: A Probabilistic Segmentation Based on Interactional Pattern Analysis , 2010, GfKl.

[17]  Grzegorz Chodak,et al.  Practical Aspects of Log File Analysis for E-Commerce , 2013, CN.

[18]  Virgílio A. F. Almeida,et al.  In search of invariants for e-business workloads , 2000, EC '00.

[19]  Jordi Torres,et al.  A methodology for the evaluation of high response time on E-commerce users and sales , 2014, Inf. Syst. Frontiers.

[20]  Paul Barford,et al.  Impression Fraud in On-line Advertising via Pay-Per-View Networks , 2013, USENIX Security Symposium.