Dmdedup : Device Mapper Target for Data Deduplication

We present Dmdedup, a versatile and practical primarystorage deduplication platform suitable for both regular users and researchers. Dmdedup operates at the block layer, so it is usable with existing file systems and applications. Since most deduplication research focuses on metadata management, we designed and implemented a flexible backend API that lets developers easily build and evaluate various metadata management policies. We implemented and evaluated three backends: an in-RAM table, an on-disk table, and an on-disk COW B-tree. We have evaluated Dmdedup under a variety of workloads and report the evaluation results here. Although it was initially designed for research flexibility, Dmdedup is fully functional and can be used in production. Under many real-world workloads, Dmdedup’s throughput exceeds that of a raw block device by 1.5–6×.

[1]  Kata Jayasekara Persistent data library , 2008 .

[2]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[3]  Mahadev Satyanarayanan,et al.  Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems Based on Virtual Machines , 2006, USENIX Annual Technical Conference, General Track.

[4]  André Brinkmann,et al.  dedupv1: Improving deduplication throughput using solid state drives (SSD) , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[5]  Timothy Bisson,et al.  iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.

[6]  M. A. E. Alandi DEXT3: Block Level Inline Deduplication for EXT3 File System , 2012 .

[7]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[8]  Yanpei Chen,et al.  Design implications for enterprise storage systems via multi-dimensional trace analysis , 2011, SOSP '11.

[9]  Maohua Lu,et al.  Insights for data reduction in primary storage: a practical analysis , 2012, SYSTOR '12.

[10]  Sudipta Sengupta,et al.  Primary Data Deduplication - Large Scale Study and System Design , 2012, USENIX Annual Technical Conference.

[11]  Ashish Gehani,et al.  Performance and extension of user space file systems , 2010, SAC '10.

[12]  William J. Bolosky,et al.  Single instance storage in Windows® 2000 , 2000 .

[13]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[14]  Fei Xie,et al.  Estimating Duplication by Content-based Sampling , 2013, USENIX Annual Technical Conference.

[15]  William J. Bolosky,et al.  Single Instance Storage in Windows , 2000 .

[16]  Raju Rangaswami,et al.  I/O Deduplication: Utilizing content similarity to improve I/O performance , 2010, TOS.

[17]  Jim Gray,et al.  Empirical Measurements of Disk Failure Rates and Error Rates , 2007, ArXiv.

[18]  Takashi Watanabe,et al.  DBLK: Deduplication for primary block storage , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[19]  Ethan L. Miller,et al.  HANDS: A heuristically arranged non-backup in-line deduplication system , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[20]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.