Generation and validation of empirically-derived tcp application workloads

This dissertation proposes and evaluates a new approach for generating realistic traffic in networking experiments. The main problem solved by our approach is generating closed-loop traffic consistent with the behavior of the entire set of applications in modern traffic mixes. Unlike earlier approaches, which described individual applications in terms of the specific semantics of each application, we describe the source behavior driving each connection in a generic manner using the a-b-t model. This model provides an intuitive but detailed way of describing source behavior in terms of connection vectors that capture the sizes and ordering of application data units, the quiet times between them, and whether data exchange is sequential or concurrent. This is consistent with the view of traffic from TCP, which does not concern itself with application semantics. The a-b-t model also satisfies a crucial property: given a packet header trace collected from an arbitrary Internet link, we can algorithmically infer the source-level behavior driving each connection, and cast it into the notation of the model. The result of packet header processing is a collection of a-b-t connection vectors, which can then be replayed in software simulators and testbed experiments to drive network stacks. Such a replay generates synthetic traffic that fully preserves the feedback loop between the TCP endpoints and the state of the network, which is essential in experiments where network congestion can occur. By construction, this type of traffic generation is fully reproducible, providing a solid foundation for comparative empirical studies. Our experimental work demonstrates the high quality of the generated traffic, by directly comparing traces from real Internet links and their source-level trace replays for a rich set of metrics. Such comparison requires the careful measurement of network parameters for each connection, and their reproduction together with the corresponding source behavior. Our final contribution consists of two resampling methods for introducing controlled variability in network experiments and for generating closed-loop traffic that accurately matches a target offered load.

[1]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[2]  Luigi Rizzo,et al.  Dummynet: a simple approach to the evaluation of network protocols , 1997, CCRV.

[3]  Kevin Jeffay,et al.  Generating Realistic TCP Workloads , 2004, Int. CMG Conference.

[4]  A. Mena,et al.  An empirical study of real audio traffic , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[5]  Sally Floyd,et al.  Difficulties in simulating the internet , 2001, TNET.

[6]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[7]  Alan Weiss,et al.  A compound model for TCP connection arrivals for LAN and WAN applications , 2002, Comput. Networks.

[8]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[9]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[10]  J. Marron,et al.  SiZer for Exploration of Structures in Curves , 1999 .

[11]  Masaki Aida,et al.  Pseudo-address generation algorithm of packet destinations for Internet performance simulation , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[12]  Allen B. Downey,et al.  Evidence for long-tailed distributions in the internet , 2001, IMW '01.

[13]  Andrew B. Nobel,et al.  Understanding patterns of TCP connection usage with statistical clustering , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[14]  Anja Feldmann,et al.  TCP/IP traffic dynamics and network performance: a lesson in workload modeling, flow control, and trace-driven simulations , 2001, CCRV.

[15]  Zhi-Li Zhang,et al.  Small-time scaling behaviors of Internet backbone traffic: an empirical study , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[16]  Tao Ye,et al.  Divide and conquer: PC-based packet trace replay at OC-48 speeds , 2005, First International Conference on Testbeds and Research Infrastructures for the DEvelopment of NeTworks and COMmunities.

[17]  Hao Jiang,et al.  The Effect of Flow Capacities on the Burstiness of Aggregated Traffic , 2004, PAM.

[18]  Steven McCanne,et al.  The BSD Packet Filter: A New Architecture for User-level Packet Capture , 1993, USENIX Winter.

[19]  Vishal Misra,et al.  Fluid-based analysis of a network of AQM routers supporting TCP flows with an application to RED , 2000, SIGCOMM.

[20]  Donald F. Towsley,et al.  Modeling TCP throughput: a simple model and its empirical validation , 1998, SIGCOMM '98.

[21]  Patrice Abry,et al.  Does fractal scaling at the IP level depend on TCP flow arrival processes? , 2002, IMW '02.

[22]  Kun-Chan Lan,et al.  Rapid model parameterization from traffic measurements , 2002, TOMC.

[23]  Matthew Mathis,et al.  The macroscopic behavior of the TCP congestion avoidance algorithm , 1997, CCRV.

[24]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[25]  James Stephen Marron,et al.  Mice and Elephants Visualization of Internet Traffic , 2002, COMPSTAT.

[26]  Paul Barford,et al.  Self-configuring network traffic generation , 2004, IMC '04.

[27]  Vinod Yegneswaran,et al.  A framework for malicious workload generation , 2004, IMC '04.

[28]  Ian Graham,et al.  Precision timestamping of network packets , 2001, IMW '01.

[29]  Fouad A. Tobagi,et al.  Provisioning IP backbone networks to support latency sensitive traffic , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[30]  Brian Kantor,et al.  Network news transfer protocol a proposed standard for the stream-based transmission of news , 1986 .

[31]  D. C. Feldmeier,et al.  Improving gateway performance with a routing-table cache , 1988, IEEE INFOCOM '88,Seventh Annual Joint Conference of the IEEE Computer and Communcations Societies. Networks: Evolution or Revolution?.

[32]  Allen B. Downey,et al.  The structural cause of file size distributions , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[33]  Jin Cao,et al.  Stochastic models for generating synthetic HTTP source traffic , 2004, IEEE INFOCOM 2004.

[34]  Kevin Jeffay,et al.  The effects of active queue management on web performance , 2003, SIGCOMM '03.

[35]  Walter Willinger,et al.  Analysis, modeling and generation of self-similar VBR video traffic , 1994, SIGCOMM.

[36]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[37]  Anja Feldmann,et al.  Packet trace manipulation rramework for test labs , 2004, IMC '04.

[38]  J. S. Marron,et al.  Long-range dependence in a changing Internet traffic mix , 2005, Comput. Networks.

[39]  Kevin Jeffay,et al.  Variability in TCP round-trip times , 2003, IMC '03.

[40]  Nicolas Hohn,et al.  Inverting sampled traffic , 2006, IEEE/ACM Trans. Netw..

[41]  Azer Bestavros,et al.  Changes in Web client access patterns: Characteristics and caching implications , 1999, World Wide Web.

[42]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[43]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[44]  Walter Willinger,et al.  Self-Similar Network Traffic and Performance Evaluation , 2000 .

[45]  Paul T. Brady,et al.  A technique for investigating on-off patterns of speech , 1965 .

[46]  Paul Barford,et al.  Improving accuracy in end-to-end packet loss measurement , 2005, SIGCOMM '05.

[47]  W. Richard Stevens,et al.  TCP/IP Illustrated, Volume 1: The Protocols , 1994 .

[48]  K. Jeffay,et al.  Methodology For Developing Empirical Models of TCP-Based Applications * ( Extended Abstract ) , 2001 .

[49]  Michalis Faloutsos,et al.  File-sharing in the Internet: A characterization of P2P traffic in the backbone , 2003 .

[50]  David Ott,et al.  Tuning RED for Web traffic , 2000, SIGCOMM.

[51]  Liang Guo,et al.  The war between mice and elephants , 2001, Proceedings Ninth International Conference on Network Protocols. ICNP 2001.

[52]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[53]  Raj Jain,et al.  Characteristics of Destination Address Locality in Computer Networks: A Comparison of Caching Schemes , 1990, Comput. Networks ISDN Syst..

[54]  V. Paxson,et al.  WHERE MATHEMATICS MEETS THE INTERNET , 1998 .

[55]  Walter Willinger,et al.  Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level , 1997, TNET.

[56]  NagleJohn Congestion control in IP/TCP internetworks , 1984 .

[57]  Lester Lipsky,et al.  Long-lasting transient conditions in simulations with heavy-tailed workloads , 1997, WSC '97.

[58]  Richard G. Baraniuk,et al.  Connection-level analysis and modeling of network traffic , 2001, IMW '01.

[59]  Peter Druschel,et al.  Measuring the capacity of a Web server under realistic loads , 1999, World Wide Web.

[60]  Edward W. Knightly,et al.  D-BIND: an accurate traffic model for providing QoS guarantees to VBR traffic , 1997, TNET.

[61]  Peter B. Danzig,et al.  Characteristics of wide-area TCP/IP conversations , 1991, SIGCOMM 1991.

[62]  Walter Willinger,et al.  Experimental queueing analysis with long-range dependent packet traffic , 1996, TNET.

[63]  Eddie Kohler,et al.  Internet research needs better models , 2003, CCRV.

[64]  Deborah Estrin,et al.  Advances in network simulation , 2000, Computer.

[65]  Sally Floyd,et al.  Wide area traffic: the failure of Poisson modeling , 1995, TNET.

[66]  Hao Jiang,et al.  Passive estimation of TCP round-trip times , 2002, CCRV.

[67]  Kun-Chan Lan,et al.  Generation of high bandwidth network traffic traces , 2002, Proceedings. 10th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems.

[68]  Bruce A. Mah,et al.  An empirical model of HTTP network traffic , 1997, Proceedings of INFOCOM '97.

[69]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[70]  Patrice Abry,et al.  Wavelet Analysis of Long-Range-Dependent Traffic , 1998, IEEE Trans. Inf. Theory.

[71]  Félix Hernández-Campos,et al.  Assessing the real impact of 802.11 WLANs: a large-scale comparison of wired and wireless traffic , 2005, 2005 14th IEEE Workshop on Local & Metropolitan Area Networks.

[72]  Paul Barford,et al.  A performance evaluation of hyper text transfer protocols , 1999, SIGMETRICS '99.

[73]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[74]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[75]  Eddie Kohler,et al.  Observed structure of addresses in IP traffic , 2006, TNET.

[76]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[77]  Anja Feldmann,et al.  On the impact of variability on the buffer dynamics in IP networks , 1999 .

[78]  Anja Feldmann,et al.  Dynamics of IP traffic: a study of the role of variability and the impact of control , 1999, SIGCOMM '99.

[79]  James S. Walker,et al.  A Primer on Wavelets and Their Scientific Applications , 1999 .

[80]  Carey Williamson,et al.  A Synthetic Workload Model for Internet Mosaic Traffic , 1995 .

[81]  Gennady Samorodnitsky,et al.  Variable heavy tails in Internet traffic , 2004, Perform. Evaluation.

[82]  Kevin Jeffay,et al.  What TCP/IP protocol headers can tell us about the web , 2001, SIGMETRICS '01.

[83]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[84]  Tzi-cker Chiueh,et al.  Improving Route Lookup Performance Using Network Processor Cache , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[85]  Stefan Savage,et al.  Monkey See, Monkey Do: A Tool for TCP Tracing and Replaying , 2004, USENIX ATC, General Track.

[86]  Vidyadhar G. Kulkarni,et al.  STOCHASTIC DIFFERENTIAL EQUATION FOR TCP WINDOW SIZE: ANALYSIS AND EXPERIMENTAL VALIDATION , 2004, Probability in the Engineering and Informational Sciences.

[87]  Guido Appenzeller,et al.  Sizing router buffers , 2004, SIGCOMM '04.

[88]  Vern Paxson,et al.  Fast, approximate synthesis of fractional Gaussian noise for generating self-similar network traffic , 1997, CCRV.

[89]  Anja Feldmann,et al.  Characteristics of TCP Connection Arrivals , 2002 .

[90]  Craig Partridge,et al.  Improving round-trip time estimates in reliable transport protocols , 1991, TOCS.

[91]  Kevin Jeffay,et al.  Tracking the evolution of Web traffic: 1995-2003 , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..