Web traffic modeling at finer time scales and performance implications

The performance of Web sites continues to be an important research topic. Such studies are invariably based on the access logs from the servers comprising the Web site. A problem with existing access logs is the coarse granularity of the timestamps, e.g., arrival times. In this study we demonstrate and quantify the significant differences in performance obtained under diverse assumptions about the arrival process of user requests derived from the access logs, where the corresponding user response times can differ by more than an order of magnitude. This motivates the need for a general methodology to construct accurate representations of the actual arrival process of user requests from existing coarse-grained access-log data. Our analysis of the access logs from representative commercial Web sites illustrates self-similar behavior of the arrival process. We propose a drill-down methodology for constructing the arrival process at finer time scales based on the self-similar properties of the arrival process observed at coarse logging time scales. The advantage of our approach is that it maintains consistency between the properties of the arrival processes at both coarser and finer time scales. In addition, our analysis of the request size distribution from commercial Web sites demonstrates a subexponential, but not heavy-tail (power-law) distribution. Through simulations, we investigate the impact of these different traffic models on user response times.

[1]  M. Crovella,et al.  Estimating the Heavy Tail Index from Scaling Properties , 1999 .

[2]  Anja Feldmann,et al.  Fitting Mixtures of Exponentials to Long-Tail Distributions to Analyze Network , 1998, Perform. Evaluation.

[3]  Predrag R. Jelenkovic,et al.  Resource sharing with subexponential distributions , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[4]  Virgílio A. F. Almeida,et al.  A methodology for workload characterization of E-commerce sites , 1999, EC '99.

[5]  Mark S. Squillante,et al.  Web traffic modeling and Web server performance analysis , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[6]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[7]  Anja Feldmann,et al.  The changing nature of network traffic: scaling phenomena , 1998, CCRV.

[8]  A. Pakes ON THE TAILS OF WAITING-TIME DISTRIBUTIONS , 1975 .

[9]  Leonard Kleinrock,et al.  Queueing Systems - Vol. 1: Theory , 1975 .

[10]  Cathy H. Xia,et al.  Queueing systems with long-range dependent input process and subexponential service times , 2003, SIGMETRICS '03.

[11]  Mark S. Squillante,et al.  Analysis and characterization of large‐scale Web server access patterns and performance , 1999, World Wide Web.

[12]  Sally Floyd,et al.  Wide-Area Traffic: The Failure of Poisson Modeling , 1994, SIGCOMM.

[13]  Matthias Grossglauser,et al.  On the relevance of long-range dependence in network traffic , 1999, TNET.

[14]  Leonard Kleinrock,et al.  Theory, Volume 1, Queueing Systems , 1975 .

[15]  Zhi-Li Zhang,et al.  Small-time scaling behaviors of Internet backbone traffic: an empirical study , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[16]  Patrice Abry,et al.  Does fractal scaling at the IP level depend on TCP flow arrival processes? , 2002, IMW '02.

[17]  Mark E. Crovella,et al.  Effect of traffic self-similarity on network performance , 1997, Other Conferences.

[18]  L. Schrage,et al.  Queueing systems, Vol. I: Theory , 1977, Proceedings of the IEEE.

[19]  Patrice Abry,et al.  Wavelet Analysis of Long-Range-Dependent Traffic , 1998, IEEE Trans. Inf. Theory.

[20]  W. Fuller,et al.  Distribution of the Estimators for Autoregressive Time Series with a Unit Root , 1979 .

[21]  Zhen Liu,et al.  Traffic model and performance evaluation of Web servers , 2001, Perform. Evaluation.

[22]  P. Phillips Testing for a Unit Root in Time Series Regression , 1988 .

[23]  Nick Duffield,et al.  Large deviations and overflow probabilities for the general single-server queue, with applications , 1995 .

[24]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[25]  Walter Willinger,et al.  Self-Similarity in High-Speed Packet Traffic: Analysis and Modeling of Ethernet Traffic Measurements , 1995 .

[26]  Allen B. Downey The structural cause of file size distributions , 2001, SIGMETRICS '01.

[27]  S. Resnick Heavy tail modeling and teletraffic data: special invited paper , 1997 .

[28]  Vern Paxson,et al.  Fast approximation of self-similar network traffic , 1995, SIGCOMM 1995.

[29]  Philippe Owezarski,et al.  Modeling Internet backbone traffic at the flow level , 2003, IEEE Trans. Signal Process..

[30]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[31]  Ilkka Norros,et al.  A storage model with self-similar input , 1994, Queueing Syst. Theory Appl..

[32]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1997, TNET.

[33]  Rudolf H. Riedi,et al.  Multifractal Properties of TCP Traffic: a Numerical Study , 1997 .

[34]  Jan Beran,et al.  Statistics for long-memory processes , 1994 .

[35]  Sidney I. Resnick,et al.  Heavy Tail Modelling and Teletraffic Data , 1995 .

[36]  V. Chistyakov A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes , 1964 .