SDRE: Selective data redundancy elimination for resource constrained hosts

Data redundancy elimination (DRE), also known as data de-duplication, reduces the data amount to be transferred or stored by identifying and eliminating both intra-object and inter-object duplicated data elements. It is one of the key content delivery acceleration techniques over wide area networks (WANs) to reduce delivery latency and bandwidth consumptions by reducing the amount of data to be transferred. Deploying DRE at the end hosts maximizes the bandwidth savings and latency reductions, because the amount of content sent to the destination hosts is minimized. However, standard DRE used to identify redundant content chunks is very expensive in terms of memory and processing capability especially on resource constrained hosts. By analyzing the web application traffic traces, we find out that some types of contents have more redundant contents than others. Thus, it is possible to apply DRE selectively and opportunistically on those contents with more redundant data elements than other content types to save the memory and processing resources at the hosts. In this paper, we propose content-type based selective DRE (SDRE), which deploys DRE selectively on the contents which have the most opportunities for redundant content identification. We explore the benefits of deploying SDRE on smartphone traffic traces. The results show that SDRE can achieve almost the same bandwidth savings as that of standard DRE with less computation and smaller memory.

[1]  Zhanhuai Li,et al.  Data deduplication techniques , 2010, 2010 International Conference on Future Information Technology and Management Engineering.

[2]  Ted Grevers,et al.  Application Acceleration and WAN Optimization Fundamentals , 2007 .

[3]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[4]  Vyas Sekar,et al.  SmartRE: an architecture for coordinated network-wide redundancy elimination , 2009, SIGCOMM '09.

[5]  Srinivasan Seshan,et al.  Packet caches on routers: the implications of universal redundant traffic elimination , 2008, SIGCOMM '08.

[6]  David Hung-Chang Du,et al.  Frequency Based Chunking for Data De-Duplication , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[7]  Aditya Akella,et al.  Redundancy in network traffic: findings and implications , 2009, SIGMETRICS '09.

[8]  Ibrahiem M. M. El,et al.  Comparative Study Between Various Algorithms of Data Compression Techniques , 2007 .

[9]  Irfan Ahmad,et al.  Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.

[10]  Sean Matthew Dorward,et al.  Awarded Best Paper! - Venti: A New Approach to Archival Data Storage , 2002 .

[11]  Tiancheng Li,et al.  Block Size Optimization in Deduplication Systems , 2009, 2009 Data Compression Conference.

[12]  George Varghese,et al.  EndRE: An End-System Redundancy Elimination Service for Enterprises , 2010, NSDI.

[13]  Walter F. Tichy,et al.  An Empirical Study of Delta Algorithms , 1996, SCM.

[14]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[15]  Pin Zhou,et al.  Demystifying data deduplication , 2008, Companion '08.

[16]  William J. Bolosky,et al.  Single instance storage in Windows® 2000 , 2000 .

[17]  Sumanta Saha,et al.  CombiHeader: Minimizing the number of shim headers in redundancy elimination systems , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[18]  David Wetherall,et al.  A protocol-independent technique for eliminating redundant network traffic , 2000, SIGCOMM.