99 Deduplication Problems

Deduplication is a widely studied capacity optimization technique that replaces redundant regions of data with references. Not only is deduplication an ongoing area of academic research, numerous vendors have deduplicated storage products. Historically, most deduplication-related publications focus on a narrow range of topics: maximizing deduplication ratios and read/write performance. While future research will continue to optimize these areas, we believe that there are numerous novel, deduplication-specific problems that have been largely ignored in the academic community. Based on feedback from customers as well as internal architecture discussions, we present new deduplication problems that will hopefully spur the next generation of research.

[1]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[2]  Danny Harnik,et al.  Estimating Unseen Deduplication - from Theory to Practice , 2016, FAST.

[3]  João Paulo,et al.  A Survey and Classification of Storage Deduplication Systems , 2014, ACM Comput. Surv..

[4]  William H. Sanders,et al.  Modeling the Fault Tolerance Consequences of Deduplication , 2011, 2011 IEEE 30th International Symposium on Reliable Distributed Systems.

[5]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[6]  Mark Chamness,et al.  Capacity forecasting in a backup storage environment , 2011 .

[7]  Timothy Bisson,et al.  iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.

[8]  Anees Shaikh,et al.  Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.

[9]  Sean Matthew Dorward,et al.  Awarded Best Paper! - Venti: A New Approach to Archival Data Storage , 2002 .

[10]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[11]  Fred Douglis,et al.  RAIDShield: Characterizing, Monitoring, and Proactively Protecting Against Disk Failures , 2015, FAST.

[12]  Fred Douglis,et al.  Content-aware Load Balancing for Distributed Backup , 2011, LISA.

[13]  Benny Pinkas,et al.  Side Channels in Cloud Services: Deduplication in Cloud Storage , 2010, IEEE Security & Privacy.

[14]  Mark Lillibridge,et al.  Improving restore speed for backup systems that use inline chunk-based deduplication , 2013, FAST.