A Contribution Towards Solving the Web Workload Puzzle

World Wide Web, the biggest distributed system ever built, experiences tremendous growth and change in Web sites, users, and technology. A realistic and accurate characterization of Web workload is the first, fundamental step in areas such as performance analysis and prediction, capacity planning, and admission control. Compared to the previous work, in this paper we present more detailed and rigorous statistical analysis of both request and session level characteristics of Web workload based on empirical data extracted from actual logs of four Web servers. Our analysis is focused on exploring phenomena such as self-similarity, long-range dependence, and heavy-tailed distributions. Identification of these phenomena in real data is a challenging task since the existing methods may perform erratically in practice and produce misleading results. We provide more accurate analysis of long-range dependence of the request and session arrival processes by removing the trend and periodicity. In addition to the session arrival process (i.e., inter-session characteristics), we study several intra-session characteristics using several different methods to test the existence of heavy-tailed behavior and cross validate the results. Finally, we point out specific problems associated with the methods used for establishing long-range dependence and heavy-tailed behavior of Web workloads. We believe that the comprehensive model presented in this paper is a step towards solving the Web workload puzzle

[1]  Patrice Abry,et al.  Wavelet Analysis of Long-Range-Dependent Traffic , 1998, IEEE Trans. Inf. Theory.

[2]  Michalis Faloutsos,et al.  A user-friendly self-similarity analysis tool , 2003, CCRV.

[3]  Katerina Goseva-Popstojanova,et al.  Empirical Characterization of Session–Based Workload and Reliability for Web Servers , 2006, Empirical Software Engineering.

[4]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[5]  Virgílio A. F. Almeida,et al.  A methodology for workload characterization of E-commerce sites , 1999, EC '99.

[6]  P. Phillips,et al.  Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? , 1992 .

[7]  Martin Arlitt,et al.  A workload characterization study of the 1998 World Cup Web site , 2000, IEEE Netw..

[8]  Michalis Faloutsos,et al.  Long-range dependence ten years of Internet traffic modeling , 2004, IEEE Internet Computing.

[9]  Ludmila Cherkasova,et al.  Session Based Admission Control: A Mechanism for Improving the Performance of an Overloaded Web Server , 1998 .

[10]  Ludmila Cherkasova,et al.  Session-Based Admission Control: A Mechanism for Peak Load Management of Commercial Web Sites , 2002, IEEE Trans. Computers.

[11]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[12]  Michalis Faloutsos,et al.  Long-range dependence: now you see it, now you don't! , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[13]  Walter Willinger,et al.  Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level , 1997, TNET.

[14]  Paul Reeser,et al.  Analytic model of Web servers in distributed environments , 2000, WOSP '00.

[15]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[16]  Connie U. Smith,et al.  New Book - Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software , 2001, Int. CMG Conference.

[17]  Martin Arlitt,et al.  Workload Characterization of the 1998 World Cup Web Site , 1999 .

[18]  S. Mohan,et al.  Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software [Book Review] , 2003, IEEE Software.

[19]  Kevin Lü,et al.  Performance modelling and metrics of database-backed Web sites , 2000, Proceedings 11th International Workshop on Database and Expert Systems Applications.

[20]  Mark S. Squillante,et al.  Traffic modeling and performance analysis of commercial web sites , 2002, PERV.

[21]  Murad S. Taqqu,et al.  On the Self-Similar Nature of Ethernet Traffic , 1993, SIGCOMM.

[22]  Donald F. Towsley,et al.  Self-similarity and long range dependence on the internet: a second look at the evidence, origins and implications , 2005, Comput. Networks.

[23]  Michalis Faloutsos,et al.  A nonstationary Poisson view of Internet traffic , 2004, IEEE INFOCOM 2004.

[24]  Sally Floyd,et al.  Wide area traffic: the failure of Poisson modeling , 1995, TNET.

[25]  Katerina Goseva-Popstojanova,et al.  Empirical study of session-based workload and reliability for Web servers , 2004, 15th International Symposium on Software Reliability Engineering.

[26]  Jerome A. Rolia,et al.  Measurement Tools and Modeling Techniques for Evaluating Web Server Performance , 1997, Computer Performance Evaluation.

[27]  S. Resnick Heavy tail modeling and teletraffic data: special invited paper , 1997 .

[28]  Virgílio A. F. Almeida,et al.  In search of invariants for e-business workloads , 2000, EC '00.

[29]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[30]  Allen B. Downey,et al.  Evidence for long-tailed distributions in the internet , 2001, IMW '01.

[31]  Murad S. Taqqu,et al.  On estimating the intensity of long-range dependence in finite and infinite variance time series , 1998 .

[32]  Virgílio A. F. Almeida,et al.  Business-oriented resource management policies for e-commerce servers , 2000, Perform. Evaluation.

[33]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.