Understanding patterns of TCP connection usage with statistical clustering

We describe a new methodology for understanding how applications use TCP to exchange data. The method is useful for characterizing TCP workloads and synthetic traffic generation. Given a packet header trace, the method automatically constructs a source-level model of the applications using TCP in a network without any a priori knowledge of which applications are actually present in a network. From this source-level model, statistical feature vectors can be defined for each TCP connection in the trace. Hierarchical cluster analysis can then be performed to identify connections that are statistically homogeneous and that are likely exerting similar demands on a network. We apply the methods to packet header traces taken from the UNC and Abilene networks and show how classes of similar connections can be automatically detected and modeled.

[1]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[2]  Sally Floyd,et al.  Difficulties in simulating the internet , 2001, TNET.

[3]  W. T. Williams,et al.  Dissimilarity Analysis: a new Technique of Hierarchical Sub-division , 1964, Nature.

[4]  D. Moore,et al.  Building a better NetFlow , 2004, SIGCOMM '04.

[5]  Kevin Jeffay,et al.  What TCP/IP protocol headers can tell us about the web , 2001, SIGMETRICS '01.

[6]  R. Srikant,et al.  Modeling and performance analysis of BitTorrent-like peer-to-peer networks , 2004, SIGCOMM '04.

[7]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[8]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[9]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Kun-Chan Lan,et al.  Rapid model parameterization from traffic measurements , 2002, TOMC.

[11]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[12]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[13]  Bruce A. Mah,et al.  An empirical model of HTTP network traffic , 1997, Proceedings of INFOCOM '97.

[14]  Kevin Jeffay,et al.  Tracking the evolution of Web traffic: 1995-2003 , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[15]  Peter B. Danzig,et al.  Characteristics of wide-area TCP/IP conversations , 1991, SIGCOMM '91.

[16]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[17]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[18]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[19]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[20]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1997, TNET.

[21]  Brian Everitt,et al.  Cluster analysis , 1974 .

[22]  Balachander Krishnamurthy,et al.  Traffic classification for application specific peering , 2002, IMW '02.

[23]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.