Website Navigation Behavior Analysis for Bot Detection

Detecting bots is an important goal for most website admins. In this paper, we propose a novel machine learning bot detection approach based on local website navigation behavior. While machine learning has been used before for bot detection, most existing approaches rely on general hypotheses based on statistical analysis over multiple websites and are thus easy to counter. In our work, we build a website-specific hypothesis or classifier based on the actual navigation data of the website. The advantages of our approach is that it can be generally used to detect any type of bots and is difficult to counter unless website-specific bots are designed as well. Our classifier uses a Two-Class Boosted Decision Tree classification model and can be periodically re-trained to learn new hypotheses as bots evolve. We tested our approach on two real-world websites and achieved an accuracy of around 83%, outperforming the state-of-the-art machine-learning-based bot detection techniques by almost 14%. We also show that our approach can successfully distinguish between various classes of bots and we show how it can be deployed as a real-world application by any website to automatically detect bots as they navigate the website.

[1]  Marc Najork,et al.  Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.

[2]  Yuh-Jye Lee,et al.  Statistical learning methods for information security: fundamentals and case studies , 2015 .

[3]  Marios D. Dikaiakos,et al.  Web robot detection: A probabilistic reasoning approach , 2009, Comput. Networks.

[4]  Venu Govindaraju,et al.  Embedded noninteractive continuous bot detection , 2008, CIE.

[5]  Chao Wang,et al.  Mining key information of web pages: A method and its application , 2007, Expert Syst. Appl..

[6]  Julie Greensmith,et al.  DCA for bot detection , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[7]  Chun-Ying Huang,et al.  Fast-Flux Bot Detection in Real Time , 2010, RAID.

[8]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[9]  Juyong Park,et al.  Online game bot detection based on party-play log analysis , 2013, Comput. Math. Appl..

[10]  José Carlos Brustoloni,et al.  Bayesian bot detection based on DNS traffic similarity , 2009, SAC '09.

[11]  David L. Roberts,et al.  Natural Interaction for Bot Detection , 2016, IEEE Internet Computing.

[12]  M. HamidR.Jamali,et al.  Web robot detection in the scholarly information environment , 2008, J. Inf. Sci..

[13]  Engin Kirda,et al.  Server-Side Bot Detection in Massively Multiplayer Online Games , 2009, IEEE Security & Privacy.

[14]  S. Prayla Shyry Efficient Identification of Bots by K-Means Clustering , 2016 .

[15]  Huan Liu,et al.  A new approach to bot detection: Striking the balance between precision and recall , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[16]  Kouichi Sakurai,et al.  Bot Detection Based on Traffic Analysis , 2007, The 2007 International Conference on Intelligent Pervasive Computing (IPC 2007).

[17]  Qifa Ke,et al.  SBotMiner: large scale search bot detection , 2010, WSDM '10.

[18]  Sean F. McKenna,et al.  Detection and classification of Web robots with honeypots , 2016 .

[19]  Lars Schmidt-Thieme,et al.  Web Robot Detection - Preprocessing Web Logfiles for Robot Detection , 2005 .

[20]  Hongwen Kang,et al.  Large-scale bot detection for search engines , 2010, WWW '10.

[21]  Kang-Won Lee,et al.  Securing Web Service by Automatic Robot Detection , 2006, USENIX Annual Technical Conference, General Track.

[22]  Lei Liu,et al.  BotTracer: Execution-Based Bot-Like Malware Detection , 2008, ISC.