Hierarchical Workload Characterization for a Busy Web Server

This paper introduces the concept of a Web server access hierarchy--a three-tier hierarchy that describes the traffic to a Web server in three levels: as aggregate traffic from multiple clients, as traffic from individual clients, and as traffic within sessions of individual clients. A detailed workload characterization study was undertaken of the Web server access hierarchy of a busy commercial server using an access log of 80 million requests captured over seven days of observation. The behavioural characteristics that emerge from this study show different features at each level and suggest effective stategies for managing resources at busy Internet Web servers.

[1]  Azer Bestavros,et al.  Sources and characteristics of Web temporal locality , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[2]  Martin Arlitt,et al.  A workload characterization study of the 1998 World Cup Web site , 2000, IEEE Netw..

[3]  Kimberly C. Claffy,et al.  Web Traffic Characterization: An Assesment of the Impact of Caching Documents from NCSA's Web Server , 1995, Comput. Networks ISDN Syst..

[4]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[5]  Azer Bestavros,et al.  Changes in Web client access patterns: Characteristics and caching implications , 1999, World Wide Web.

[6]  Juan A. Garay,et al.  Analysis of Page-Reference Strings of an Interactive System , 1988, IBM J. Res. Dev..

[7]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[8]  James E. Pitkow,et al.  Characterizing Browsing Behaviors on the World-Wide Web , 1995 .

[9]  Virgílio A. F. Almeida,et al.  In search of invariants for e-business workloads , 2000, EC '00.

[10]  Virgílio A. F. Almeida,et al.  Analyzing Web Robots and Their Impact on Caching , 2001 .

[11]  Virgílio A. F. Almeida,et al.  Analyzing robot behavior in e-business sites , 2001, SIGMETRICS '01.

[12]  Jeffrey C. Mogul,et al.  The case for persistent-connection HTTP , 1995, SIGCOMM '95.

[13]  Jeffrey R. Spirn,et al.  Distance String Models for Program Behavior , 1976, Computer.

[14]  Jeffrey C. Mogul,et al.  Network Behavior of a Busy Web Server and its Clients , 1999 .

[15]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1997, TNET.

[16]  Martin F. Arlitt,et al.  Characterizing Web user sessions , 2000, PERV.

[17]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[18]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .