An Automatic Scheme to Categorize User Sessions in Modern HTTP Traffic

The characterization of HTTP traffic is crucial for performance evaluation and server design. In this paper, we analyze massive Web traces generated by various busy servers in recent years, trying to find the new features of modern HTTP traffic and user behaviors. Comparing the conclusions of earlier studies with our results, we have spotted considerable unconventional ingredients in modern HTTP traffic that could hardly be described by previous models. We also propose an innovative scheme to automatically categorize these various ingredients in modern traffic. The novel aspects of our work are: (1)It reveals the sophisticated composition of modern HTTP traffic with solid evidence, (2)It provides an automatic method to analyze the composition of modern HTTP traffic and (3)It promises a powerful manner to evaluate the possible performance implication of modern HTTP traffic on existing Web servers. We hope this work would help researchers and designers to better understand new features of HTTP workloads and therefore make corresponding adaptations in design practice.

[1]  Marios D. Dikaiakos,et al.  Characterizing Crawler Behavior from Web Server Access Logs , 2003, EC-Web.

[2]  Zhen Liu,et al.  Traffic model and performance evaluation of Web servers , 2001, Perform. Evaluation.

[3]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[4]  Virgílio A. F. Almeida,et al.  Analyzing robot behavior in e-business sites , 2001, SIGMETRICS '01.

[5]  Kevin Jeffay,et al.  What TCP/IP protocol headers can tell us about the web , 2001, SIGMETRICS '01.

[6]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[7]  Mark S. Squillante,et al.  Traffic modeling and performance analysis of commercial web sites , 2002, PERV.

[8]  Virgílio A. F. Almeida,et al.  A methodology for workload characterization of E-commerce sites , 1999, EC '99.

[9]  Hyoung-Kee Choi,et al.  A behavioral model of Web traffic , 1999, Proceedings. Seventh International Conference on Network Protocols.

[10]  Martin F. Arlitt,et al.  Characterizing Web user sessions , 2000, PERV.

[11]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[12]  Mark S. Squillante,et al.  Web traffic modeling at finer time scales and performance implications , 2005, Perform. Evaluation.

[13]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[14]  Virgílio A. F. Almeida,et al.  In search of invariants for e-business workloads , 2000, EC '00.

[15]  Virgílio A. F. Almeida,et al.  Business-oriented resource management policies for e-commerce servers , 2000, Perform. Evaluation.