Decentralized Deduplication in SAN Cluster File Systems

File systems hosting virtual machines typically contain many duplicated blocks of data resulting in wasted storage space and increased storage array cache footprint. Deduplication addresses these problems by storing a single instance of each unique data block and sharing it between all original sources of that data. While deduplication is well understood for file systems with a centralized component, we investigate it in a decentralized cluster file system, specifically in the context of VM storage. We propose DEDE, a block-level deduplication system for live cluster file systems that does not require any central coordination, tolerates host failures, and takes advantage of the block layout policies of an existing cluster file system. In DEDE, hosts keep summaries of their own writes to the cluster file system in shared on-disk logs. Each host periodically and independently processes the summaries of its locked files, merges them with a shared index of blocks, and reclaims any duplicate blocks. DEDE manipulates metadata using general file system interfaces without knowledge of the file system implementation. We present the design, implementation, and evaluation of our techniques in the context of VMware ESX Server. Our results show an 80% reduction in space with minor performance overhead for realistic workloads.

[1]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[2]  Michal Kaczmarczyk,et al.  HYDRAstor: A Scalable Secondary Storage , 2009, FAST.

[3]  Anand Sivasubramaniam,et al.  Providing tunable consistency for a parallel file store , 2005, FAST'05.

[4]  Ajay Gulati,et al.  Storage Workload Characterization and Consolidation in Virtualized Environments , 2008 .

[5]  William J. Bolosky,et al.  Single Instance Storage in Windows , 2000 .

[6]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[7]  Anand Sivasubramaniam,et al.  Evaluating the usefulness of content addressable storage for high-performance data intensive applications , 2008, HPDC '08.

[8]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[9]  William J. Bolosky,et al.  Single instance storage in Windows® 2000 , 2000 .

[10]  Sean Matthew Dorward,et al.  Awarded Best Paper! - Venti: A New Approach to Archival Data Storage , 2002 .

[11]  Anthony Liguori,et al.  Experiences with Content Addressable Storage and Virtual Disks , 2008, Workshop on I/O Virtualization.

[12]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[13]  Darrell D. E. Long,et al.  Duplicate Data Elimination in a SAN File System , 2004, MSST.

[14]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[15]  Grant Erickson,et al.  A 64-bit, shared disk file system for Linux , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[16]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[17]  Aleksey Pesterev,et al.  Fast, Inexpensive Content-Addressed Storage in Foundation , 2008, USENIX Annual Technical Conference.

[18]  Robert M. Rees,et al.  IBM Storage Tank - A heterogeneous scalable SAN file system , 2003, IBM Syst. J..

[19]  Rob Kolstad Conference on File and Storage Technologies (FAST '02) , 2002, login Usenix Mag..

[20]  Mahadev Satyanarayanan,et al.  Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems Based on Virtual Machines , 2006, USENIX Annual Technical Conference, General Track.