Computer Sciences Department Understanding and Exploiting Network Traffic Redundancy

Abstract—The Internet carries a vast amount and a wide rangeof content. Some of this content is more popular, and accessedmore frequently, than others. The popularity of content couldbe quite ephemeral - e.g., a Web flash crowd - or much morepermanent - e.g., google.com’s banner. A direct consequence ofthe skew in popularity is that, at any time, a fraction of theinformation carried over the Internet is redundant.We make two contributions in this paper. First, we study thefundamental properties of the redundancy in the informationcarried over the Internet, with a focus on network edges. Wecollect traffic traces at two network edge locations – a largeuniversity’s access link serving roughly 50,000 users, and a tier-1ISP network link connected to a large data center. We conductseveral analyses over this data: What fraction of bytes areredundant? What is the frequency at which strings of bytes re peatacross different packets? What is the overlap in the informa tionaccessed by distinct groups of end-users?Second, we leverage our measurement observations in thedesign of a family mechanisms for eliminating redundancy innetwork traffic and improving the overall network performan ce.The mechanisms we proposed can improve the available capacityof single network links as well as balance load across multiplenetwork links.

[1]  David Wetherall,et al.  A protocol-independent technique for eliminating redundant network traffic , 2000, SIGCOMM.

[2]  Alec Wolman,et al.  Organization-Based Analysis of Web-Object Sharing and Caching , 1999, USENIX Symposium on Internet Technologies and Systems.

[3]  Muriel Médard,et al.  XORs in the Air: Practical Wireless Network Coding , 2006, IEEE/ACM Transactions on Networking.

[4]  Ratul Mahajan,et al.  Colt ? ? ? ? ? ? ◦ DTAG ? ◦ • ◦ ? ? ? ? ! ◦ ? ? ? ◦ ◦ ? ? Eqip ? ? ? ? ? ? , 2003 .

[5]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[6]  Jia Wang,et al.  Locating internet bottlenecks: algorithms, measurements, and implications , 2004, SIGCOMM '04.

[7]  Baochun Li,et al.  How Practical is Network Coding? , 2006, 200614th IEEE International Workshop on Quality of Service.

[8]  Vyas Sekar,et al.  Data streaming algorithms for estimating entropy of network traffic , 2006, SIGMETRICS '06/Performance '06.

[9]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[10]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[11]  Ming Zhang,et al.  RR-TCP: a reordering-robust TCP with DSACK , 2003, 11th IEEE International Conference on Network Protocols, 2003. Proceedings..

[12]  Abhishek Kumar,et al.  Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[13]  G. Voelker,et al.  On the scale and performance of cooperative Web proxy caching , 2000, OPSR.

[14]  Anees Shaikh,et al.  An empirical evaluation of wide-area internet bottlenecks , 2003, SIGMETRICS '03.

[15]  K. Jain,et al.  Practical Network Coding , 2003 .

[16]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[17]  George Varghese,et al.  Automated Worm Fingerprinting , 2004, OSDI.

[18]  Ying Zhu,et al.  Multicast with network coding in application-layer overlay networks , 2004, IEEE J. Sel. Areas Commun..