A protocol-independent technique for eliminating redundant network traffic

We present a technique for identifying repetitive information transfers and use it to analyze the redundancy of network traffic. Our insight is that dynamic content, streaming media and other traffic that is not caught by today's Web caches is nonetheless likely to derive from similar information. We have therefore adapted similarity detection techniques to the problem of designing a system to eliminate redundant transfers. We identify repeated byte ranges between packets to avoid retransmitting the redundant data. We find a high level of redundancy and are able to detect repetition that Web proxy caches are not. In our traces, after Web proxy caching has been applied, an additional 39% of the original volume of Web traffic is found to be redundant. Moreover, because our technique makes no assumptions about HTTP protocol syntax or caching semantics, it provides immediate benefits for other types of content, such as streaming media, FTP traffic, news and mail.

[1]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[2]  Van Jacobson,et al.  Compressing TCP/IP Headers for Low-Speed Serial Links , 1990, RFC.

[3]  R. Wilder,et al.  Wide-area Internet traffic patterns and characteristics , 1997, IEEE Netw..

[4]  Anja Feldmann,et al.  Web proxy caching: the devil is in the details , 1998, PERV.

[5]  Anja Feldmann,et al.  Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.

[6]  David Wetherall,et al.  Increasing Effective Link Bandwidth by Supressing Replicated Data , 1998, USENIX Annual Technical Conference.

[7]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[8]  Jeffrey C. Mogul,et al.  A trace-based analysis of duplicate suppression in HTTP , 2000 .

[9]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[10]  kc claffy,et al.  The nature of the beast: Recent traffic measurements from an Internet backbone , 1998 .

[11]  Alec Wolman,et al.  On the scale and performance of cooperative Web proxy caching , 1999, SOSP.

[12]  Barron C. Housel,et al.  WebExpress: a system for optimizing Web browsing in a wireless environment , 1996, MobiCom '96.

[13]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[14]  Anja Feldmann,et al.  Performance of Web proxy caching in heterogeneous bandwidth environments , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[15]  Alec Wolman,et al.  Organization-Based Analysis of Web-Object Sharing and Caching , 1999, USENIX Symposium on Internet Technologies and Systems.