Detection of Web site visitors based on fuzzy rough sets

Despite emerging of Web 2.0 applications and increasing requirements to well-behaved Web robots, malicious ones can reveal irreparable risks for Web sites. Regardless of behavior of Web robots, they may occupy bandwidth and reduce performance of Web servers. In spite of many prestigious researches trying to characterize Web visitors and classify them, there is a lack of concentration on feature selection to dynamically choose attributes used to describe Web sessions. On the other hand, depending on an accurate clustering technique, which can deal with huge number of samples in a reasonable amount of time, is practically important. Therefore, in this paper, a new algorithm, fuzzy rough set–Web robot detection (FRS-WRD), is proposed based on fuzzy rough set theory to better characterize and cluster Web visitors of three real Web sites. External evaluations show that in contrast to state-of-the-art algorithms, FRS-WRD achieves better results in terms of G-mean 95%, Jaccard 88%, entropy 0.36, and finally, purity 96%. Moreover, according to confusion matrixes, it can better detect malicious Web visitors.

[1]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[2]  Shun-Zheng Yu,et al.  Web Robot Detection Based on Hidden Markov Model , 2006, 2006 International Conference on Communications, Circuits and Systems.

[3]  Grazyna Suchacka,et al.  Detection of Internet robots using a Bayesian approach , 2015, 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF).

[4]  Jean Dezert,et al.  Credal c-means clustering method based on belief functions , 2015, Knowl. Based Syst..

[5]  Yee Leung,et al.  Connections between rough set theory and Dempster-Shafer theory of evidence , 2002, Int. J. Gen. Syst..

[6]  Javad Hamidzadeh,et al.  New Hermite orthogonal polynomial kernel and combined kernels in Support Vector Machine classifier , 2016, Pattern Recognit..

[7]  Lotfi A. Zadeh,et al.  The Concepts of a Linguistic Variable and its Application to Approximate Reasoning , 1975 .

[8]  Syed Abdul Sattar,et al.  A fuzzy neural network based framework to discover user access patterns from web log data , 2017, Adv. Data Anal. Classif..

[9]  Hadi Sadoghi Yazdi,et al.  LMIRA: Large Margin Instance Reduction Algorithm , 2014, Neurocomputing.

[10]  Jun-Hai Zhai,et al.  Fuzzy decision tree based on fuzzy-rough technique , 2011, Soft Comput..

[11]  Xizhao Wang,et al.  Induction of multiple fuzzy decision trees based on rough set technique , 2008, Inf. Sci..

[12]  Hyungkyu Lee,et al.  Classification of web robots: An empirical study based on over one billion requests , 2009, Comput. Secur..

[13]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[14]  Francisco Herrera,et al.  OWA-FRPS: A Prototype Selection Method Based on Ordered Weighted Average Fuzzy Rough Set Theory , 2013, RSFDGrC.

[15]  Francisco Herrera,et al.  FRPS: A Fuzzy Rough Prototype Selection method , 2013, Pattern Recognit..

[16]  Howard C. Card,et al.  Vector quantization of images using modified adaptive resonance algorithm for hierarchical clustering , 2001, IEEE Trans. Neural Networks.

[17]  Robert Nowicki,et al.  Application of Rough Sets in k Nearest Neighbours Algorithm for Classification of Incomplete Samples , 2014, KICSS.

[18]  Thierry Denoeux,et al.  CEVCLUS: evidential clustering with instance-level constraints for relational data , 2014, Soft Comput..

[19]  Javad Hamidzadeh,et al.  Automatic support vector data description , 2016, Soft Computing.

[20]  Anna Maria Radzikowska,et al.  A comparative study of fuzzy rough sets , 2002, Fuzzy Sets Syst..

[21]  Jiye Liang,et al.  Fuzzy-rough feature selection accelerator , 2015, Fuzzy Sets Syst..

[22]  Aijun An,et al.  Detection of malicious and non-malicious website visitors using unsupervised neural network learning , 2013, Appl. Soft Comput..

[23]  Lars Schmidt-Thieme,et al.  Web Robot Detection - Preprocessing Web Logfiles for Robot Detection , 2005 .

[24]  Majid Vafaei Jahan,et al.  A density based clustering approach for web robot detection , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[25]  Quan Pan,et al.  Adaptive imputation of missing values for incomplete pattern classification , 2016, Pattern Recognit..

[26]  Teuvo Kohonen,et al.  Essentials of the self-organizing map , 2013, Neural Networks.

[27]  Yiyu Yao,et al.  Interpretation of Belief Functions in The Theory of Rough Sets , 1998, Inf. Sci..

[28]  Degang Chen,et al.  Measures of general fuzzy rough sets on a probabilistic space , 2008, Inf. Sci..

[29]  Ali A. Ghorbani,et al.  Botnet detection based on traffic behavior analysis and flow intervals , 2013, Comput. Secur..

[30]  Marios D. Dikaiakos,et al.  Web robot detection: A probabilistic reasoning approach , 2009, Comput. Networks.

[31]  Anália Lourenço,et al.  Catching web crawlers in the act , 2006, ICWE '06.

[32]  Majid Vafaei Jahan,et al.  A density based clustering approach to distinguish between web robot and human requests to a web server , 2014, ISC Int. J. Inf. Secur..

[33]  Julio Gonzalo,et al.  A general evaluation measure for document organization tasks , 2013, SIGIR.

[34]  Om Prakash Vyas,et al.  Agglomerative Approach for Identification and Elimination of Web Robots from Web Server Logs to Extract Knowledge about Actual Visitors , 2015 .

[35]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[36]  Javad Hamidzadeh,et al.  IRDDS: Instance reduction based on Distance-based decision surface , 2015 .

[37]  Leo Mrsic,et al.  Lino - An Intelligent System for Detecting Malicious Web-Robots , 2015, ACIIDS.

[38]  Aijun An,et al.  Feature evaluation for web crawler detection with data mining techniques , 2012, Expert Syst. Appl..

[39]  Chris Cornelis,et al.  Fuzzy-Rough Hybridization , 2015, Handbook of Computational Intelligence.

[40]  Hadi Sadoghi Yazdi,et al.  IRAHC: Instance Reduction Algorithm using Hyperrectangle Clustering , 2015, Pattern Recognit..

[41]  D. Dubois,et al.  ROUGH FUZZY SETS AND FUZZY ROUGH SETS , 1990 .