Detecting and Characterizing Web Bot Traffic in a Large E-commerce Marketplace

A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.

[1]  Ron Kohavi,et al.  Ten Supplementary Analyses to Improve E-commerce Web Sites , 2003 .

[2]  Wonho Kim,et al.  Suppressing bot traffic with accurate human attestation , 2010, APSys '10.

[3]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[4]  Katerina Goseva-Popstojanova,et al.  Characterization and classification of malicious Web traffic , 2014, Comput. Secur..

[5]  Filippo Menczer,et al.  On the lack of typical behavior in the global Web traffic network , 2005, WWW '05.

[6]  Anja Feldmann,et al.  Back-Office Web Traffic on The Internet , 2014, Internet Measurement Conference.

[7]  Sean F. McKenna,et al.  Detection and classification of Web robots with honeypots , 2016 .

[8]  Yiqun Liu,et al.  Detecting Crowdturfing "Add to Favorites" Activities in Online Shopping , 2018, WWW.

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  Angelos Stavrou,et al.  E-commerce Reputation Manipulation: The Emergence of Reputation-Escalation-as-a-Service , 2015, WWW.

[11]  Howard N. Rude,et al.  Intelligent Caching to Mitigate the Impact of Web Robots on Web Servers , 2016 .

[12]  Alefiya Hussain,et al.  Effect of Malicious Traffic on the Network , 2003 .

[13]  Zhao Li,et al.  Online E-Commerce Fraud: A Large-Scale Detection and Analysis , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[14]  Hongwen Kang,et al.  Large-scale bot detection for search engines , 2010, WWW '10.

[15]  Gregory Buehrer,et al.  A large-scale study of automated web search traffic , 2008, AIRWeb '08.

[16]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[17]  Vivek S. Pai,et al.  Towards understanding modern web traffic , 2011, SIGMETRICS '11.

[18]  Susan T. Dumais,et al.  Large scale analysis of web revisitation patterns , 2008, CHI.

[19]  Grazyna Suchacka,et al.  Detection of Internet robots using a Bayesian approach , 2015, 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF).

[20]  Haining Wang,et al.  Surviving a search engine overload , 2012, WWW.

[21]  Derek Doran,et al.  Request Type Prediction for Web Robot and Internet of Things Traffic , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[22]  Swapna S. Gokhale,et al.  Discovering New Trends in Web Robot Traffic Through Functional Classification , 2008, 2008 Seventh IEEE International Symposium on Network Computing and Applications.

[23]  Georgios Kambourakis,et al.  DDoS in the IoT: Mirai and Other Botnets , 2017, Computer.

[24]  Hari Balakrishnan,et al.  Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks , 2009, NSDI.