New biostatistics features for detecting web bot activity on web applications

Abstract Web bots are malicious scripts that automatically traverse the websites, fill the web form and illegally scrap the data from web sites. The never-ending threat of web bot is causing serious problems on the web applications. According to various web bot traffic reports, more than fifty percent of the total web traffic is coming from web bots. An effective safeguard against automated web bots is to detect the human user presence on the web applications. Most part of the existing research is focused on specific web bot detection such as form spamming bot, data scrapping bots, chat bots, and game bots. In this paper, the web bot detection model is proposed using combined supervised and unsupervised machine learning algorithms. In this paper, new Biostatistics features are proposed which is used to identify the human user presence on web applications. The Biostatistics features have proven very effective in discriminating human users from general web bots. Various attack scenarios are created for web bot attacks such as automated account registration, automatic form filling, and data scrapping to mimic the zero-day web bot attacks. The proposed model is evaluated by numerous experiments using standard evaluation parameters. The result analysis reveals that the proposed model is efficient in discriminating human users from web bots.

[1]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[2]  Jon Crowcroft,et al.  Stweeler: A Framework for Twitter Bot Analysis , 2016, WWW.

[3]  Raúl Monroy,et al.  Pattern-Based and Visual Analytics for Visitor Analysis on Websites , 2019, Applied Sciences.

[4]  George Cybenko,et al.  Security Analytics and Measurements , 2012, IEEE Security & Privacy.

[5]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[6]  Francesco Masulli,et al.  Online Web Bot Detection Using a Sequential Classification Approach , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[7]  Jason Brand,et al.  Automation is a breeze with AutoIt , 2005, SIGUCCS '05.

[8]  Rizwan Ur Rahman,et al.  Botnet Threats to E-Commerce Web Applications and Their Detection , 2021, Research Anthology on Combating Denial-of-Service Attacks.

[9]  Lars Schmidt-Thieme,et al.  Web Robot Detection - Preprocessing Web Logfiles for Robot Detection , 2005 .

[10]  Tadayoshi Kohno,et al.  Polymorphism as a Defense for Automated Attack of Websites , 2014, ACNS.

[11]  I. Woungang,et al.  Combining Mouse and Keystroke Dynamics Biometrics for Risk-Based Authentication in Web Environments , 2012, 2012 Fourth International Conference on Digital Home.

[12]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[13]  Yi Zhu,et al.  Click Fraud , 2009, Mark. Sci..

[14]  Shady Elbassuoni,et al.  Website Navigation Behavior Analysis for Bot Detection , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[15]  A. Stassopoulou,et al.  Crawler Detection: A Bayesian Approach , 2006, International Conference on Internet Surveillance and Protection (ICISP’06).

[16]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[17]  Steven Myers,et al.  The Nuts and Bolts of a Forum Spam Automator , 2011, LEET.

[18]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[19]  Nazanin Firoozeh,et al.  How much money do spammers make from your website? , 2012, CUBE.

[20]  Mike Thelwall,et al.  A web crawler design for data mining , 2001, J. Inf. Sci..

[21]  Steven Gianvecchio,et al.  Measurement and Classification of Humans and Bots in Internet Chat , 2008, USENIX Security Symposium.

[22]  Feng Mao,et al.  Evasive bots masquerading as human beings on the web , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[23]  Swapna S. Gokhale,et al.  An integrated method for real time and offline web robot detection , 2016, Expert Syst. J. Knowl. Eng..

[24]  Shalini S Singh,et al.  K-means v/s K-medoids: A Comparative Study , 2011 .

[25]  Steven Bethard,et al.  Decaptcha: Breaking 75% of eBay Audio CAPTCHAs , 2009, WOOT.

[26]  Kim-Kwang Raymond Choo,et al.  Detecting Malicious Social Bots Based on Clickstream Sequences , 2019, IEEE Access.

[27]  Theodoros Kostoulas,et al.  Towards a framework for detecting advanced Web bots , 2019, ARES.

[28]  S. Das,et al.  Dynamic Image Based CAPTCHA , 2012, 2012 International Conference on Communication Systems and Network Technologies.

[29]  Patrick Bours Continuous keystroke dynamics: A different perspective towards biometric evaluation , 2012, Inf. Secur. Tech. Rep..

[30]  Tariq Mahmood,et al.  Security Analytics: Big Data Analytics for cybersecurity: A review of trends, techniques and tools , 2013, 2013 2nd National Conference on Information Assurance (NCIA).

[31]  Patrick Th. Eugster,et al.  WebRanz: web page randomization for better advertisement delivery and web-bot prevention , 2016, SIGSOFT FSE.

[32]  Zhu Shu-ren Authentication based on feature of hand-written signature , 2007 .

[33]  Aijun An,et al.  Feature evaluation for web crawler detection with data mining techniques , 2012, Expert Syst. Appl..

[34]  Archana Bhattarai,et al.  Characterizing comment spam in the blogosphere through content analysis , 2009, 2009 IEEE Symposium on Computational Intelligence in Cyber Security.