论文信息 - Multi-level Selective Deduplication for VM Snapshots in Cloud Storage

Multi-level Selective Deduplication for VM Snapshots in Cloud Storage

In a virtualized cloud computing environment, frequent snapshot backup of virtual disks improves hosting reliability but storage demand of such operations is huge. While dirty bit-based technique can identify unmodified data between versions, full deduplication with fingerprint comparison can remove more redundant content at the cost of computing resources. This paper presents a multi-level selective deduplication scheme which integrates inner-VM and cross-VM duplicate elimination under a stringent resource requirement. This scheme uses popular common data to facilitate fingerprint comparison while reducing the cost and it strikes a balance between local and global deduplication to increase parallelism and improve reliability. Experimental results show the proposed scheme can achieve high deduplication ratio while using a small amount of cloud resources.

[1] Michael Vrable,et al. Cumulus: Filesystem backup to the cloud , 2009, TOS.

[2] Brad Fitzpatrick,et al. Distributed caching with memcached , 2004 .

[3] Hong Jiang,et al. SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme with Low RAM Overhead and High Throughput , 2011, USENIX Annual Technical Conference.

[4] Li Fan,et al. Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[5] Andrew Warfield,et al. Facilitating the Development of Soft Devices , 2005, USENIX Annual Technical Conference, General Track.

[6] Irfan Ahmad,et al. Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.

[7] Aleksey Pesterev,et al. Fast, Inexpensive Content-Addressed Storage in Foundation , 2008, USENIX Annual Technical Conference.

[8] Kai Li,et al. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[9] Lada A. Adamic,et al. Zipf's law and the Internet , 2002, Glottometrics.

[10] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.

[11] Mark Lillibridge,et al. Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.

[12] Sean Quinlan,et al. Venti: A New Approach to Archival Storage , 2002, FAST.

[13] Mark Lillibridge,et al. Extreme Binning: Scalable, parallel deduplication for chunk-based file backup , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[14] Hong Jiang,et al. CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[15] John C. S. Lui,et al. Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud , 2011, Middleware.

[16] Michal Kaczmarczyk,et al. HYDRAstor: A Scalable Secondary Storage , 2009, FAST.