Web Workload Characterization: Ten Years Later

In 1996, Arlitt and Williamson [Arlitt et al., 1997] conducted a comprehensive workload characterization study of Internet Web servers. By analyzing access logs from 6 Web sites (3 academic, 2 research, and 1 industrial) in 1994 and 1995, the authors identified 10 invariants: workload characteristics common to all the sites that are likely to persist over time. In this present work, we revisit the 1996 work by Arlitt and Williamson, repeating many of the same analyses on new data sets collected in 2004. In particular, we study access logs from the same 3 academic sites used in the 1996 paper. Despite a 30-fold increase in overall traffic volume from 1994 to 2004, our main conclusion is that there are no dramatic changes in Web server workload characteristics in the last 10 years. Although there have been many changes in Web technologies (e.g., new protocols, scripting languages, caching infrastructures), most of the 1996 invariants still hold true today. We postulate that these invariants will continue to hold in the future, because they represent fundamental characteristics of how humans organize, store, and access information on the Web.

[1]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[2]  M. Crovella,et al.  Estimating the Heavy Tail Index from Scaling Properties , 1999 .

[3]  Gregory D. Abowd,et al.  Workload of a Media-Enhanced Classroom Server , 2000 .

[4]  Carey L. Williamson,et al.  On filter effects in web caching hierarchies , 2002, TOIT.

[5]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[6]  Kevin Jeffay,et al.  Tracking the evolution of Web traffic: 1995-2003 , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[7]  Andrew Odlyzko,et al.  Internet traffic growth: sources and implications , 2003, SPIE ITCom.

[8]  Magnus Karlsson,et al.  Dynamics and evolution of Web sites: analysis, metrics and design issues , 2001, Proceedings. Sixth IEEE Symposium on Computers and Communications.

[9]  Eric R. Ziegel,et al.  Engineering Statistics , 2004, Technometrics.

[10]  Anderson-Darling : A Goodness of Fit Test for Small Samples Assumptions , .

[11]  Sally Floyd,et al.  Wide area traffic: the failure of Poisson modeling , 1995, TNET.

[12]  Azer Bestavros,et al.  Changes in Web client access patterns: Characteristics and caching implications , 1999, World Wide Web.

[13]  Carey L. Williamson,et al.  Temporal locality and its impact on Web proxy cache performance , 2000, Perform. Evaluation.

[14]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[15]  James E. Pitkow Summary of WWW characterizations , 2004, World Wide Web.