This paper presentsTstat [1], a new tool for the collection and statistical analysis of TCP/IP traffic, able to infer TCP connection status from trace data. Discussing its use, we present some of the performance figures that can be obtained and the insight that such figures can give on TCP/IP protocols and the Internet. While field measures have always been the starting point for networks planning and dimensioning, their statistical analysis beyond simple traffic volume estimation is not so common. One of the main reasons is the enormous amount of possible performance figures that can be devised in TCP/IP networks. Tstat automatically derives about 80 different performance indices both at the IP and at the TCP level, allowing a very deep insight in the network performance. While standard performance measure, such as flow dimensions, traffic distribution, etc., remain at the base of traffic evaluation, more sophisticated indices, like the ou tof-order probability and gap dimension in TCP connections, obtained through data correlation between the incoming and outgoing traffic, give reliable estimates of the network performance also from the user perspective. Several of these indices are discussed on traffic measures performed for more than 2 months on the access link of our institution. I. TRAFFIC MEASURES IN THEINTERNET Planning and dimensioning of TLC networks was always based on traffic measures, upon which estimates and models are built to be used with the appropriate mathematical tools. While this process proved to be reasonably simple in traditional, circuit switched, telephone networks, it seems to be much harder in packet switched data networks, specially in the Internet, where the TCP/IP client-server communication paradigm, inherently introduces correlation among traffic relation both in space and time. While a large part of this difficulty lies in the failure of traditional modeling paradigms [2], [3], there are also sev eral key points to be solved in performing the measures themselves and, most of all, in organizing the enormous amount of data that are collected through measures. First of all, the client-server communication paradigm implies that the traffic behavior does have meaning only when the This work was supported by the Italian Ministry for Universi ty and Scientific Research through the PLANET-IP Project. forward and backward traffic are jointly analyzed, otherwise half of the story goes unwritten, and should be hardly inferred. This problem makes measuring inherently difficult; it can be solved if measures are taken on the network edge, where the outgoing and incoming flows are necessarily coupled, but it can prove impossible in the backbone, where the peering contracts among providers often disjoint the forward and backward routes [4]. Second, data traffic must be characterized to a higher level of detail than voice traffic, since the ‘always-on’ characteristics of mos t sources and the nature itself of packet switching require the collections of data at the session, flow, and packet level , while circuit switched traffic is well characterized by the connection level alone. This is due to the source model of the traffic, which is well characterized and relatively simple in case of voice traffic, but more complex and variable in case of data networks, where different application models can coexist and interact together. Notice that, in the absence of CAC (Connection Admission Control) functions and in the presence of connectionless services, the notion of connection itself becomes quit e fuzzy in the Internet. Finally, the complexity and layered structure of the TCP/IP protocol suite, requires the analysis of traffic at least at three different layers (IP, TCP/UDP , Application) in order to have a picture of the traffic clear enough to allow the interpretation of data. Starting from the pioneering work of Danzig [5], [6], [7] and of Paxons and Floyd [2], [8] in which the authors characterized the traffic of the ”first Internet” via measures, there has always been an increasing interest in the data collection, measure and analysis, to characterize either the network protocol or the users behavior. After the birth of the Web, lots of effort has been devoted to study caching and content delivery architecture, which intrinsi cally are based on the deep knowledge of the traffic and user behavior. Thus many works analyze traces at the application levels, typically log files of web servers or proxy servers [9], [10], [11]. These are then very helpful understand user behavior, but less interesting from the network point of view. Many projects are instead using real traffic traces, captured form large campus networks, like the work in [12], where the authors characterize the HTTP protocol by using
[1]
Azer Bestavros,et al.
Self-similarity in World Wide Web traffic: evidence and possible causes
,
1996,
SIGMETRICS '96.
[2]
Peter B. Danzig,et al.
tcplib: A Library of TCP Internetwork Traffic Characteristics
,
2002
.
[3]
Michael J. Feeley,et al.
The Measured Access Characteristics of World-Wide-Web Client Proxy Caches
,
1997,
USENIX Symposium on Internet Technologies and Systems.
[4]
Vern Paxson,et al.
Empirically derived analytic models of wide-area TCP connections
,
1994,
TNET.
[5]
Vern Paxson,et al.
End-to-end routing behavior in the Internet
,
1996,
TNET.
[6]
Li Fan,et al.
Summary cache: a scalable wide-area web cache sharing protocol
,
2000,
TNET.
[7]
Mark Crovella,et al.
Self - similarity in World Wide Web: Evidence and possible causes
,
1997
.
[8]
Sally Floyd,et al.
Wide area traffic: the failure of Poisson modeling
,
1995,
TNET.
[9]
Bruce A. Mah,et al.
An empirical model of HTTP network traffic
,
1997,
Proceedings of INFOCOM '97.
[10]
Dong Lin,et al.
IP packet generation: statistical models for TCP start times based on connection-rate superposition
,
2000,
SIGMETRICS '00.
[11]
Anja Feldmann,et al.
Performance of Web proxy caching in heterogeneous bandwidth environments
,
1999,
IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).
[12]
Van Jacobson,et al.
TCP Extensions for High Performance
,
1992,
RFC.
[13]
Luca Deri,et al.
Effective traffic measurement using ntop
,
2000
.
[14]
Steve Parker,et al.
Some Testing Tools for TCP Implementors
,
1998,
RFC.
[15]
Kevin Jeffay,et al.
What TCP/IP protocol headers can tell us about the web
,
2001,
SIGMETRICS '01.
[16]
Deborah Estrin,et al.
An Empirical Workload Model for Driving Wide-Area TCP/IP Network Simulations
,
2001
.
[17]
Peter B. Danzig,et al.
Characteristics of wide-area TCP/IP conversations
,
1991,
SIGCOMM 1991.