A protocol-independent technique for eliminating redundant network traffic

We present a technique for identifying repetitive information transfers and use it to analyze the redundancy of network traffic. Our insight is that dynamic content, streaming media and other traffic that is not caught by today's Web caches is nonetheless likely to derive from similar information. We have therefore adapted similarity detection techniques to the problem of designing a system to eliminate redundant transfers. We identify repeated byte ranges between packets to avoid retransmitting the redundant data. We find a high level of redundancy and are able to detect repetition that Web proxy caches are not. In our traces, after Web proxy caching has been applied, an additional 39% of the original volume of Web traffic is found to be redundant. Moreover, because our technique makes no assumptions about HTTP protocol syntax or caching semantics, it provides immediate benefits for other types of content, such as streaming media, FTP traffic, news and mail.

[1]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[2]  Anja Feldmann,et al.  Performance of Web proxy caching in heterogeneous bandwidth environments , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[3]  Anja Feldmann,et al.  Web proxy caching: the devil is in the details , 1998, PERV.

[4]  G. Voelker,et al.  On the scale and performance of cooperative Web proxy caching , 2000, OPSR.

[5]  R. Wilder,et al.  Wide-area Internet traffic patterns and characteristics , 1997, IEEE Netw..

[6]  Anja Feldmann,et al.  Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.

[7]  Alec Wolman,et al.  Organization-Based Analysis of Web-Object Sharing and Caching , 1999, USENIX Symposium on Internet Technologies and Systems.

[8]  Barron C. Housel,et al.  WebExpress: a system for optimizing Web browsing in a wireless environment , 1996, MobiCom '96.

[9]  Van Jacobson,et al.  Compressing TCP/IP Headers for Low-Speed Serial Links , 1990, RFC.

[10]  kc claffy,et al.  The nature of the beast: Recent traffic measurements from an Internet backbone , 1998 .

[11]  Jeffrey C. Mogul,et al.  A trace-based analysis of duplicate suppression in HTTP , 2000 .