论文信息 - VM-centric snapshot deduplication for cloud data backup

VM-centric snapshot deduplication for cloud data backup

Data deduplication is important for snapshot backup of virtual machines (VMs) because of excessive redundant content. Fingerprint search for source-side duplicate detection is resource intensive when the backup service for VMs is co-located with other cloud services. This paper presents the design and analysis of a fast VM-centric backup service with a tradeoff for a competitive deduplication efficiency while using small computing resources, suitable for running on a converged cloud architecture that cohosts many other services. The design consideration includes VM-centric file system block management for the increased VM snapshot availability. This paper describes an evaluation of this VM-centric scheme to assess its deduplication efficiency, resource usage, and fault tolerance.

Wei Zhang | Tao Yang | Richard Wolski | Hong Tang | Daniel Agun

[1] Hao Jiang,et al. Multi-level Selective Deduplication for VM Snapshots in Cloud Storage , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[2] Sean Quinlan,et al. Venti: A New Approach to Archival Storage , 2002, FAST.

[3] Sriram Rao,et al. A The Quantcast File System , 2013, Proc. VLDB Endow..

[4] Michael Vrable,et al. Cumulus: Filesystem backup to the cloud , 2009, TOS.

[5] Timothy Bisson,et al. iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.

[6] GhemawatSanjay,et al. The Google file system , 2003 .

[7] Aleksey Pesterev,et al. Fast, Inexpensive Content-Addressed Storage in Foundation , 2008, USENIX Annual Technical Conference.

[8] Petros Efstathopoulos,et al. Building a High-performance Deduplication System , 2011, USENIX Annual Technical Conference.

[9] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10] Ethan L. Miller,et al. The effectiveness of deduplication on virtual machine disk images , 2009, SYSTOR '09.

[11] Mark Lillibridge,et al. Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.

[12] Irfan Ahmad,et al. Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.

[13] Darrell D. E. Long,et al. Providing High Reliability in a Minimum Redundancy Archival Storage System , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[14] Kai Li,et al. Tradeoffs in Scalable Data Routing for Deduplication Clusters , 2011, FAST.

[15] Andrew Warfield,et al. Facilitating the Development of Soft Devices , 2005, USENIX Annual Technical Conference, General Track.

[16] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.

[17] Xiaozhou Li,et al. Reliability analysis of deduplicated and erasure-coded storage , 2011, PERV.

[18] Shmuel Tomi Klein,et al. The design of a similarity based deduplication system , 2009, SYSTOR '09.

[19] Hong Jiang,et al. SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup , 2010, 2010 39th International Conference on Parallel Processing.

[20] Grant Wallace,et al. Efficiently Storing Virtual Machine Backups , 2013, HotStorage.

[21] Li Fan,et al. Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[22] Mark Lillibridge,et al. Extreme Binning: Scalable, parallel deduplication for chunk-based file backup , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[23] Yucheng Zhang,et al. Design Tradeoffs for Data Deduplication Performance in Backup Workloads , 2015, FAST.

[24] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.

[25] Yuanyuan Tian,et al. CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop , 2011, Proc. VLDB Endow..

[26] Hong Jiang,et al. Application-Aware Local-Global Source Deduplication for Cloud Backup Services of Personal Storage , 2014, IEEE Transactions on Parallel and Distributed Systems.

[27] Kave Eshghi,et al. A Framework for Analyzing and Improving Content-Based Chunking Algorithms , 2005 .

[28] Kai Li,et al. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.