Design of an exact data deduplication cluster

Data deduplication is an important component of enterprise storage environments. The throughput and capacity limitations of single node solutions have led to the development of clustered deduplication systems. Most implemented clustered inline solutions are trading deduplication ratio versus performance and are willing to miss opportunities to detect redundant data, which a single node system would detect. We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution. The use of locality and load balancing paradigms enables the nodes to minimize information exchange. Therefore, we are able to show that, despite different claims in previous papers, it is possible to combine exact deduplication, small chunk sizes, and scalability within one environment using only a commodity GBit Ethernet interconnect. Additionally, we investigate the throughput and scalability limitations with a special focus on the intra-node communication.

[1]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[2]  Shmuel Tomi Klein,et al.  The design of a similarity based deduplication system , 2009, SYSTOR '09.

[3]  Gokul B. Kandiraju,et al.  Modeling and simulating flash based solid-state disks for operating systems , 2010, WOSP/SIPEW '10.

[4]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[5]  André Brinkmann,et al.  Multi-level comparison of data deduplication in a backup scenario , 2009, SYSTOR '09.

[6]  Kai Li,et al.  Tradeoffs in Scalable Data Routing for Deduplication Clusters , 2011, FAST.

[7]  André Brinkmann,et al.  dedupv1: Improving deduplication throughput using solid state drives (SSD) , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  オルソン、エドウィン,et al.  Storage system for randomly named blocks of data , 2005 .

[9]  Erez Zadok,et al.  Benchmarking File System Benchmarking: It *IS* Rocket Science , 2011, HotOS.

[10]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[11]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[12]  Ethan L. Miller,et al.  Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[13]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[14]  Ethan L. Miller,et al.  The effectiveness of deduplication on virtual machine disk images , 2009, SYSTOR '09.

[15]  André Brinkmann,et al.  Reliable and randomized data distribution strategies for large scale storage systems , 2011, 2011 18th International Conference on High Performance Computing.

[16]  Mark Lillibridge,et al.  Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.

[17]  Murthy V. Devarakonda,et al.  Recovery in the Calypso file system , 1996, TOCS.

[18]  Mark Lillibridge,et al.  Extreme Binning: Scalable, parallel deduplication for chunk-based file backup , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[19]  Aleksey Pesterev,et al.  Fast, Inexpensive Content-Addressed Storage in Foundation , 2008, USENIX Annual Technical Conference.

[20]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[21]  Bianca Schroeder,et al.  Understanding latent sector errors and how to protect against them , 2010, TOS.

[22]  Friedhelm Meyer auf der Heide,et al.  V: Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System , 2004, MSST.

[23]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[24]  Robert M. Rees,et al.  IBM Storage Tank - A heterogeneous scalable SAN file system , 2003, IBM Syst. J..

[25]  Michal Kaczmarczyk,et al.  HYDRAstor: A Scalable Secondary Storage , 2009, FAST.

[26]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[27]  André Brinkmann,et al.  Inter-node Communication in Peer-to-Peer Storage Clusters , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[28]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[29]  Spencer W. Ng,et al.  Disk scrubbing in large archival storage systems , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[30]  Hong Jiang,et al.  DEBAR: A scalable high-performance de-duplication storage system for backup and archiving , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[31]  Cezary Dubnicki,et al.  HydraFS: A High-Throughput File System for the HYDRAstor Content-Addressable Storage System , 2010, FAST.

[32]  David J. Lilja,et al.  Characterizing datasets for data deduplication in backup applications , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[33]  Fred Douglis,et al.  Content-aware Load Balancing for Distributed Backup , 2011, LISA.

[34]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[35]  Hong Jiang,et al.  MAD2: A scalable high-throughput exact deduplication approach for network backup services , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[36]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[37]  Mary Baker,et al.  Sprite position statement: use distributed state for failure recovery , 1989, Proceedings of the Second Workshop on Workstation Operating Systems.

[38]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.