Low-Profile Source-side Deduplication for Virtual Machine Backup

This paper presents a source-side backup scheme with low-resource usage through collaborative deduplication and approximated lazy deletion when frequent virtual machine snapshot backup is required in a large-scale cloud cluster. The key ideas are to orchestrate multiround duplicate detection batches among machines in a partitioned asynchronous manner and remove most unreferenced content chunks with approximated snapshot deletion. This paper discusses the challenges, main design and strategies, and evaluation results.

[1]  Yucheng Zhang,et al.  Design Tradeoffs for Data Deduplication Performance in Backup Workloads , 2015, FAST.

[2]  Irfan Ahmad,et al.  Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.

[3]  Kai Li,et al.  Tradeoffs in Scalable Data Routing for Deduplication Clusters , 2011, FAST.

[4]  Timothy Bisson,et al.  iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.

[5]  Mark Lillibridge,et al.  Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.

[6]  Wei Zhang,et al.  VM-centric snapshot deduplication for cloud data backup , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Hong Jiang,et al.  Application-Aware Local-Global Source Deduplication for Cloud Backup Services of Personal Storage , 2014, IEEE Transactions on Parallel and Distributed Systems.

[8]  Mark Lillibridge,et al.  Extreme Binning: Scalable, parallel deduplication for chunk-based file backup , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[9]  Hong Jiang,et al.  SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup , 2010, 2010 39th International Conference on Parallel Processing.

[10]  Michael Vrable,et al.  Cumulus: Filesystem backup to the cloud , 2009, TOS.

[11]  Philip Shilane,et al.  Memory efficient sanitization of a deduplicated storage system , 2013, FAST.

[12]  Petros Efstathopoulos,et al.  Building a High-performance Deduplication System , 2011, USENIX Annual Technical Conference.

[13]  Cezary Dubnicki,et al.  Concurrent deletion in a distributed content-addressable storage system with global deduplication , 2013, FAST.

[14]  Wei Zhang,et al.  Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage , 2013, HotStorage.

[15]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[16]  Hao Jiang,et al.  Multi-level Selective Deduplication for VM Snapshots in Cloud Storage , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[17]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.