A soft computing approach for benign and malicious web robot detection

We propose a method called SMART (Soft computing for MAlicious RoboT detection).The method detects benign and malicious robots, and human visitors to a web server.SMART selects its features on a particular web server by fuzzy rough set theory.A graph-based clustering algorithm classifies sessions into the three agent types.Analyses on web logs suggest state-of-the-art results to detect both robot types. The accurate detection of web robot sessions from a web server log is essential to take accurate traffic-level measurements and to protect the performance and privacy of information on a Web server. Moreover, the irrecoverable risks of visits from malicious robots that intentionally try to evade web server intrusion detection systems, covering-up their visits with fabricated fields in their http request packets, cannot be ignored. To separate both types of robots from humans in practice, analysts turn to heuristic methods or state-of-the-art soft computing approaches that have only been tuned to the specification of a kind of web server. Noting that the landscape of web robot agents is ever changing, and that behavioral patterns and characteristics vary across different web servers, both options are lacking. To overcome this challenge, this paper presents SMART, a soft computing system that simultaneously detects benign and malicious types of robot agents from web server logs and can automatically adapt to the session characteristics of a web server. The results of experiments over some access log file servers, each servicing different domains of the web, demonstrate outperformance of the proposed method on state-of-the-art ones for benign and malicious robot detection.

[1]  C. Lee Giles,et al.  The Ethicality of Web Crawlers , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[2]  Grazyna Suchacka,et al.  Detection of Internet robots using a Bayesian approach , 2015, 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF).

[3]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[4]  Zhong Xu,et al.  Some new inequalities for the Hadamard product of a nonsingular M-matrix and its inverse , 2016 .

[5]  Swapna S. Gokhale,et al.  Classifying Web Robots by K-means Clustering , 2009, SEKE.

[6]  Swapna S. Gokhale,et al.  A comparison of Web robot and human requests , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[7]  Gopal Kanji,et al.  100 Statistical Tests , 1994 .

[8]  Zeshui Xu,et al.  Approaches to manage hesitant fuzzy linguistic information based on the cosine distance and similarity measures for HFLTSs and their application in qualitative decision making , 2015, Expert Syst. Appl..

[9]  Majid Vafaei Jahan,et al.  A density based clustering approach to distinguish between web robot and human requests to a web server , 2014, ISC Int. J. Inf. Secur..

[10]  Leo Mrsic,et al.  Lino - An Intelligent System for Detecting Malicious Web-Robots , 2015, ACIIDS.

[11]  Aijun An,et al.  Feature evaluation for web crawler detection with data mining techniques , 2012, Expert Syst. Appl..

[12]  Srinivasan Parthasarathy,et al.  Community Discovery in Social Networks: Applications, Methods and Emerging Trends , 2011, Social Network Data Analytics.

[13]  Heiner Stuckenschmidt,et al.  Enriching Structured Knowledge with Open Information , 2015, WWW.

[14]  Hyungkyu Lee,et al.  Classification of web robots: An empirical study based on over one billion requests , 2009, Comput. Secur..

[15]  Srinivasan Parthasarathy,et al.  Markov clustering of protein interaction networks with improved balance and scalability , 2010, BCB '10.

[16]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[17]  Lars Schmidt-Thieme,et al.  Web Robot Detection - Preprocessing Web Logfiles for Robot Detection , 2005 .

[18]  Jiye Liang,et al.  Fuzzy-rough feature selection accelerator , 2015, Fuzzy Sets Syst..

[19]  Marios D. Dikaiakos,et al.  An investigation of web crawler behavior: characterization and metrics , 2005, Comput. Commun..

[20]  Theresa Beaubouef,et al.  Rough Sets , 2019, Lecture Notes in Computer Science.

[21]  Sungdeok Cha,et al.  Web Robot Detection based on Monotonous Behavior , 2012 .

[22]  László Szilágyi,et al.  A modified Markov clustering approach to unsupervised classification of protein sequences , 2010, Neurocomputing.

[23]  Aijun An,et al.  Detection of malicious and non-malicious website visitors using unsupervised neural network learning , 2013, Appl. Soft Comput..

[24]  Derek Doran,et al.  Request Type Prediction for Web Robot and Internet of Things Traffic , 2015, ICMLA.

[25]  Julio Gonzalo,et al.  A general evaluation measure for document organization tasks , 2013, SIGIR.

[26]  Derek Doran,et al.  Request Type Prediction for Web Robot and Internet of Things Traffic , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[27]  Om Prakash Vyas,et al.  Agglomerative Approach for Identification and Elimination of Web Robots from Web Server Logs to Extract Knowledge about Actual Visitors , 2015 .

[28]  Trevor Lithgow,et al.  Evolution of the Translocation and Assembly Module (TAM) , 2015, Genome biology and evolution.

[29]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[30]  Javad Hamidzadeh,et al.  Automatic support vector data description , 2016, Soft Computing.

[31]  Anna Maria Radzikowska,et al.  A comparative study of fuzzy rough sets , 2002, Fuzzy Sets Syst..

[32]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[33]  Yogendra Kumar Jain,et al.  Min Max Normalization Based Data Perturbation Method for Privacy Protection , 2011 .

[34]  S. Dongen Graph clustering by flow simulation , 2000 .

[35]  Swapna S. Gokhale,et al.  Web robot detection techniques: overview and limitations , 2010, Data Mining and Knowledge Discovery.

[36]  Francisco Herrera,et al.  OWA-FRPS: A Prototype Selection Method Based on Ordered Weighted Average Fuzzy Rough Set Theory , 2013, RSFDGrC.

[37]  D. Dubois,et al.  ROUGH FUZZY SETS AND FUZZY ROUGH SETS , 1990 .

[38]  Marios D. Dikaiakos,et al.  Web robot detection: A probabilistic reasoning approach , 2009, Comput. Networks.

[39]  Anália Lourenço,et al.  Catching web crawlers in the act , 2006, ICWE '06.

[40]  Majid Vafaei Jahan,et al.  A density based clustering approach for web robot detection , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).