A Comparative Analysis of Browsing Behavior of Human Visitors and Automatic Software Agents

In this paper, we investigate the comparative access behavior of human visitors and automatic software agents i.e. web robots through access logs of a web portal. We perform an exhaustive investigation on the various resources acquisition trends, hourly activities, entry and exit patterns, geographic analysis of their origin, user agents and the distribution of response sizes and response codes by human visitors and web robots. Gradually web robots are continuing to proliferate and grow in sophistication for non-malicious and malicious reasons. An important share of web traffic is credited to robots and this fraction is likely to cultivate over time. Presence of web robots access traffic entries in web server log repositories imposes a great challenge to extract meaningful knowledge about browsing behavior of actual visitors. This knowledge is useful for enhancement of services for more satisfaction of genuine visitors or optimization of server resources.

[1]  M. HamidR.Jamali,et al.  Web robot detection in the scholarly information environment , 2008, J. Inf. Sci..

[2]  Swapna S. Gokhale,et al.  Web robot detection techniques: overview and limitations , 2010, Data Mining and Knowledge Discovery.

[3]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1997, TNET.

[4]  Arun Ross,et al.  Discovering Web Workload Characteristics through Cluster Analysis , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[5]  Marios D. Dikaiakos,et al.  An investigation of web crawler behavior: characterization and metrics , 2005, Comput. Commun..

[6]  Virgílio A. F. Almeida,et al.  Analyzing Web Robots and Their Impact on Caching , 2001 .

[7]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[8]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..

[9]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[10]  Shichao Zhang,et al.  Identifying interesting visitors through Web log classification , 2005, IEEE Intelligent Systems.

[11]  Christopher Krügel,et al.  PUBCRAWL: Protecting Users and Businesses from CRAWLers , 2012, USENIX Security Symposium.

[12]  Choochart Haruechaiyasak,et al.  Mining user access patterns with traversal constraint for predicting web page requests , 2006, Knowledge and Information Systems.

[13]  Xiaozhu Lin,et al.  An Automatic Scheme to Categorize User Sessions in Modern HTTP Traffic , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[14]  S. Verma,et al.  Web usage pattern analysis through web logs: A review , 2012, 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE).

[15]  Myra Spiliopoulou,et al.  Web usage mining for Web site evaluation , 2000, CACM.

[16]  Ryen W. White,et al.  WWW 2007 / Track: Browsers and User Interfaces Session: Personalization Investigating Behavioral Variability in Web Search , 2022 .

[17]  Swapna S. Gokhale,et al.  A comparison of Web robot and human requests , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[18]  Hyungkyu Lee,et al.  Classification of web robots: An empirical study based on over one billion requests , 2009, Comput. Secur..