DAC: Improving storage availability with Deduplication-Assisted Cloud-of-Clouds

Abstract With the increasing popularity and rapid development of the cloud storage technology, more and more users are beginning to upload their data to the cloud storage platform. However, solely depending on a particular cloud storage provider has a number of potentially serious problems, such as vendor lock-in, availability and security. To address these problems, we propose a Deduplication-Assisted primary storage system in Cloud-of-Clouds (short for DAC) in this paper. DAC eliminates the redundant data blocks in the cloud computing environment and distributes the data among multiple independent cloud storage providers by exploiting the data reference characteristics. In DAC, the data blocks are stored in multiple cloud storage providers by combing the replication and erasure code schemes. To better utilize the advantages of both replication and erasure code schemes and exploit the reference characteristics in data deduplication, the high referenced data blocks are stored with the replication scheme while the other data blocks are stored with the erasure code scheme. The experiments conducted on our lightweight prototype implementation show that DAC improves the performance and cost efficiency significantly, compared with the existing schemes.

[1]  Timothy Bisson,et al.  iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.

[2]  Yang Tang,et al.  NCCloud: applying network coding for the storage repair in a cloud-of-clouds , 2012, FAST.

[3]  André Brinkmann,et al.  A study on data deduplication in HPC storage systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Achim Streit,et al.  SLA enactment for large-scale healthcare workflows on multi-Cloud , 2015, Future Gener. Comput. Syst..

[5]  Mingqiang Li,et al.  CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal , 2015, IEEE Internet Computing.

[6]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[7]  Hong Jiang,et al.  IDO: Intelligent Data Outsourcing with Improved RAID Reconstruction Performance in Large-Scale Data Centers , 2012, LISA.

[8]  Eunji Lee,et al.  A Unified Buffer Cache Architecture that Subsumes Journaling Functionality via Nonvolatile Memory , 2014, TOS.

[9]  Ari Juels,et al.  HAIL: a high-availability and integrity layer for cloud storage , 2009, CCS.

[10]  John C. S. Lui,et al.  Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud , 2011, Middleware.

[11]  Sudipta Sengupta,et al.  Primary Data Deduplication - Large Scale Study and System Design , 2012, USENIX Annual Technical Conference.

[12]  Daniel J. Abadi,et al.  CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems , 2015, FAST.

[13]  Hakim Weatherspoon,et al.  RACS: a case for cloud storage diversity , 2010, SoCC '10.

[14]  Miguel Correia,et al.  DepSky: Dependable and Secure Storage in a Cloud-of-Clouds , 2013, TOS.

[15]  Xue Liu,et al.  Scheduling Heterogeneous Flows with Delay-Aware Deduplication for Avionics Applications , 2012, IEEE Transactions on Parallel and Distributed Systems.

[16]  Hong Jiang,et al.  Proactive Data Migration for Improved Storage Availability in Large-Scale Data Centers , 2015, IEEE Transactions on Computers.

[17]  Irfan Ahmad,et al.  Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.

[18]  Hong Jiang,et al.  Improving Availability of RAID-Structured Storage Systems by Workload Outsourcing , 2011, IEEE Transactions on Computers.

[19]  Yi Mu,et al.  On the security of auditing mechanisms for secure cloud storage , 2014, Future Gener. Comput. Syst..

[20]  Hong Jiang,et al.  Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud , 2014, TOS.

[21]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[22]  Hong Jiang,et al.  POD: Performance Oriented I/O Deduplication for Primary Storage Systems in the Cloud , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[23]  Hong Jiang,et al.  Improving Storage Availability in Cloud-of-Clouds with Hybrid Redundant Data Distribution , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[24]  Raju Rangaswami,et al.  I/O Deduplication: Utilizing content similarity to improve I/O performance , 2010, TOS.