Exploiting Non-Uniformities in Redundant Traffic Elimination

Protocol-independent redundant traffic elimination (RTE) at the network layer is a method of detecting and removing redundant chunks of data from data packets using caching at both ends of a network link or path. In this paper, we propose a set of techniques to improve the effectiveness of packet-level RTE. In particular, we consider two bypass techniques, with one based on packet size, and the other based on content type. Both bypass techniques are effective in reducing the processing requirements of RTE, with little or no adverse impact on redundancy detection. The bypass techniques apply at the front-end of the RTE pipeline. Within the RTE pipeline, we propose chunk overlap and oversampling as techniques that can improve redundancy detection, while obviating the storage and processing requirements associated with chunk expansion at the network endpoints as suggested by previous research. Finally, we propose savings-based cache management at the backend of the RTE pipeline, as an improvement to the commonly used FIFO-based cache management. We evaluate our techniques on full-payload packet-level traces from a university environment. Our results show that the 11-12% savings achieved with typical RTE can be improved to 16-18% with our techniques.

[1]  George Varghese,et al.  EndRE: An End-System Redundancy Elimination Service for Enterprises , 2010, NSDI.

[2]  Aditya Akella,et al.  Redundancy in network traffic: findings and implications , 2009, SIGMETRICS '09.

[3]  Fred Douglis,et al.  USENIX Association Proceedings of the General Track : 2003 USENIX Annual , 2003 .

[4]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[5]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[6]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[7]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[8]  Srinivasan Seshan,et al.  Packet caches on routers: the implications of universal redundant traffic elimination , 2008, SIGCOMM '08.

[9]  Fred Douglis,et al.  Redundancy Elimination Within Large Collections of Files , 2004, USENIX Annual Technical Conference, General Track.

[10]  David Wetherall,et al.  A protocol-independent technique for eliminating redundant network traffic , 2000, SIGCOMM 2000.

[11]  David Mazières,et al.  A low-bandwidth network file system , 2001, SOSP.

[12]  Torsten Suel,et al.  Improved file synchronization techniques for maintaining large replicated collections over slow networks , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  Carey L. Williamson,et al.  Internet Traffic Measurement , 2001, IEEE Internet Comput..

[14]  Vyas Sekar,et al.  SmartRE: an architecture for coordinated network-wide redundancy elimination , 2009, SIGCOMM '09.