Analysis and modeling of world wide web traffic

This dissertation deals with monitoring, collecting, analyzing, and modeling of World Wide Web (WWW) traffic and client interactions. The rapid growth of WWW usage has not been accompanied by an overall understanding of models of information resources and their deployment strategies. Consequently, the current Web architecture often faces performance and reliability problems. Scalability, latency, bandwidth, and disconnected operations are some of the important issues that should be considered when attempting to adjust for the growth in Web usage. The WWW Consortium launched an effort to design a new protocol that will be able to support future demands. Before doing that, however, we need to characterize current users' interactions with the WWW and understand how it is being used. We focus on proxies since they provide a good medium for caching, filtering information, payment methods, and copyright management. We collected proxy data from our environment over a period of more than two years. We also collected data from other sources such as schools, information service providers, and commercial sites. Sampling times range from days to years. We analyzed the collected data looking for important characteristics that can help in designing a better HTTP protocol. We developed a modeling approach that considers Web traffic characteristics such as self-similarity and long-range dependency. We developed an algorithm to characterize users' sessions. Finally we developed a high-level Web traffic model suitable for sensitivity analysis. As a result of this work we develop statistical models of parameters such as arrival times, file sizes, file types, and locality of reference. We describe an approach to model long-range and dependent Web traffic and we characterize activities of users accessing a digital library courseware server or Web search tools. Temporal and spatial locality of reference within examined user communities is high, so caching can be an effective tool to help reduce network traffic and to help solve the scalability problem. We recommend utilizing our findings to promote a smart distribution or push model to cache documents when there is likelihood of repeat accesses.

[1]  Richard S. Hall,et al.  A case for caching file objects inside internetworks , 1993, SIGCOMM '93.

[2]  Edward A. Fox,et al.  A Realistic Model of Request Arrival Rate to Caching Proxies , 1997 .

[3]  Mary Beth Rosson,et al.  Participatory analysis: shared development of requirements from scenarios , 1997, CHI.

[4]  F. Beaufils,et al.  FRANCE , 1979, The Lancet.

[5]  Edward A. Fox,et al.  Shared User Behavior on the World Wide Web , 1997, World Conference on the WWW and Internet.

[6]  Averill M. Law,et al.  UniFit II: total support for simulation input modeling , 1991, WSC '91.

[7]  Christine L. Borgman,et al.  Rethinking online monitoring methods for information retrieval systems: from search product to search process , 1996 .

[8]  Edward A. Fox,et al.  Removal policies in network caches for World-Wide Web documents , 1996, SIGCOMM '96.

[9]  Edward A. Fox,et al.  Web Response Time and Proxy Caching , 1998, WebNet.

[10]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[11]  Ari Luotonen,et al.  World-Wide Web Proxies , 1994, Comput. Networks ISDN Syst..

[12]  Edward A. Fox,et al.  Digital video delivery for a digital library in computer science , 1994, Electronic Imaging.

[13]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[14]  Abrams Marc,et al.  WWW Proxy Traffic Characterization with Application to Caching , 1997 .

[15]  Eric A. Brewer,et al.  System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace , 1997, USENIX Symposium on Internet Technologies and Systems.

[16]  Edward A. Fox,et al.  Quantitative analysis and visualization regarding interactive learning with a digital library in computer science (poster) , 1997, Digital library.

[17]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[18]  V. S. Subrahmanian,et al.  RESEARCH PRIORITIES FOR THE WORLD-WIDE WEB Report of the NSF Workshop Sponsored by the Information , Robotics , and Intelligent Systems , 1994 .

[19]  Mary Beth Rosson,et al.  Getting around the task-artifact cycle: how to make claims and design by scenario , 1992, TOIS.

[20]  Neil G. Smith,et al.  What can Archives offer the World Wide Web , 1994 .

[21]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[22]  Abdulla Ghaleb,et al.  Web Response Time and Proxy Caching , 1998 .

[23]  Vern Paxson,et al.  Fast, approximate synthesis of fractional Gaussian noise for generating self-similar network traffic , 1997, CCRV.

[24]  Edward A. Fox,et al.  Interactive learning with a digital library in computer science , 1996, Technology-Based Re-Engineering Engineering Education Proceedings of Frontiers in Education FIE'96 26th Annual Conference.

[25]  Jacob R. Lorch,et al.  Making World Wide Web Caching Servers Cooperate , 1996, World Wide Web J..

[26]  Margo I. Seltzer,et al.  Web Facts and Fantasy , 1997, USENIX Symposium on Internet Technologies and Systems.

[27]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[28]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[29]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[30]  Azer Bestavros,et al.  A Prefetching Protocol Using Client Speculation for the WWW , 1995 .

[31]  Abdulla Ghaleb,et al.  Characterizing World Wide Web Queries , 1997 .

[32]  Jan Beran,et al.  Statistics for long-memory processes , 1994 .

[33]  A. Nayfeh,et al.  Applied nonlinear dynamics : analytical, computational, and experimental methods , 1995 .

[34]  Edward A. Fox,et al.  NMFS: Network Multimedia File System Protocol , 1992, NOSSDAV.

[35]  James E. Pitkow,et al.  Yet Robust Caching Algorithm Based on Dynamic Access Patterns , 1994, WWW Spring 1994.

[36]  Martin Arlitt,et al.  A Performance Study of Internet Web Servers , 1996 .

[37]  C. Chatfield,et al.  Fourier Analysis of Time Series: An Introduction , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[38]  Margo I. Seltzer,et al.  World Wide Web Cache Consistency , 1996, USENIX Annual Technical Conference.

[39]  James Gettys,et al.  Network performance effects of HTTP/1.1, CSS1, and PNG , 1997, SIGCOMM '97.

[40]  T. Subba Rao,et al.  Analysis of nonlinear time series (and chaos) by bispectral methods , 1991 .

[41]  Anja Feldmann,et al.  Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.

[42]  Jeffrey C. Mogul,et al.  Improving HTTP Latency , 1995, Comput. Networks ISDN Syst..

[43]  Paul Barford,et al.  An Architecture for a WWW Workload Generator , 1997, SIGMETRICS 1997.

[44]  Hermann A. Maurer,et al.  Hyper-G now Hyperwave : the next generation Web solution , 1996 .

[45]  Will E. Leland,et al.  High time-resolution measurement and analysis of LAN traffic: Implications for LAN interconnection , 1991, IEEE INFCOM '91. The conference on Computer Communications. Tenth Annual Joint Comference of the IEEE Computer and Communications Societies Proceedings.

[46]  Steven Glassman,et al.  A Caching Relay for the World Wide Web , 1994, Comput. Networks ISDN Syst..

[47]  Abdulla Ghaleb,et al.  Modeling Correlated Proxy Web Traffic Using Fourier Analysis , 1997 .

[48]  Kimberly C. Claffy,et al.  Web Traffic Characterization: An Assesment of the Impact of Caching Documents from NCSA's Web Server , 1995, Comput. Networks ISDN Syst..

[49]  Edward A. Fox,et al.  Removal Policies in Network Caches for World-Wide Web Documents , 1996, SIGCOMM.

[50]  Annabel Pollock,et al.  What''s Wrong with Internet Searching , 1997 .

[51]  Marc Abrams,et al.  Complementing Surveying and Demographics with Automated Network Monitoring , 1996, World Wide Web J..

[52]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[53]  Walter Willinger,et al.  Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level , 1997, TNET.

[54]  Azer Bestavros,et al.  Server-Initated Document Dissemination for the WWW , 1996, IEEE Data Eng. Bull..

[55]  Hermann A. Maurer,et al.  On Second Generation Hypermedia Systems , 1994, J. Univers. Comput. Sci..

[56]  Marc Abrams,et al.  Proxy Caching That Estimates Page Load Delays , 1997, Comput. Networks.

[57]  Mimi Recker,et al.  Predicting document access in large multimedia repositories , 1996, TCHI.

[58]  Paul L. Meyer,et al.  Introductory Probability and Statistical Applications , 1970 .

[59]  Edward A. Fox,et al.  Multimedia traffic analysis using CHITRA95 , 1995, MULTIMEDIA '95.

[60]  小川 武川,et al.  Boston University , 1925 .

[61]  Robert E. McGrath,et al.  User access patterns to NCSA''s World Wide Web server , 1995 .

[62]  Terry Winograd The Proxy Is Where It''s At! , 1997 .

[63]  Edward A. Fox,et al.  Caching Proxies: Limitations and Potentials , 1995, WWW.

[64]  Christine L. Borgman,et al.  Rethinking Online Monitoring Methods for Information Retrieval Systems: From Search Product to Search Process , 1996, J. Am. Soc. Inf. Sci..

[65]  C. A. Verwijs,et al.  ED-Media '95 , 1995 .

[66]  Daniel A. Reed,et al.  NCSA's World Wide Web Server: Design and Performance , 1995, Computer.