InterCloud RAIDer: A Do-It-Yourself Multi-cloud Private Data Backup System

In this paper, we introduce InterCloud RAIDer, which realizes a multi-cloud private data backup system by composing i a data deduplication technique to reduce the overall storage overhead, ii erasure coding to achieve redundancy at low overhead, which is dispersed across multiple cloud services to realize fault-tolerance against individual service providers, specifically we use non-systematic instances of erasure codes to provide a basic level of privacy from individual cloud stores, and finally, iii a proof of data possession mechanism to detect misbehaving services - where we optimize the implementation by exploiting hash digests that are created in the prior deduplication phase. Apart from the uniqueness and non-triviality of putting these modules together, the system design also had to deal with artefacts and heterogeneity across different cloud storage services we used, namely Dropbox, Google drive and SkyDrive.

[1]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[2]  Pietro Michiardi,et al.  An empirical study of availability in friend-to-friend storage systems , 2011, 2011 IEEE International Conference on Peer-to-Peer Computing.

[3]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[4]  Ari Juels,et al.  Pors: proofs of retrievability for large files , 2007, CCS '07.

[5]  Roberto Di Pietro,et al.  Scalable and efficient provable data possession , 2008, IACR Cryptol. ePrint Arch..

[6]  Kave Eshghi,et al.  A Framework for Analyzing and Improving Content-Based Chunking Algorithms , 2005 .

[7]  Hakim Weatherspoon,et al.  RACS: a case for cloud storage diversity , 2010, SoCC '10.

[8]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[9]  Frédérique Oggier,et al.  Self-repairing homomorphic codes for distributed storage systems , 2010, 2011 Proceedings IEEE INFOCOM.

[10]  Miguel Correia,et al.  DepSky: Dependable and Secure Storage in a Cloud-of-Clouds , 2013, TOS.

[11]  Reza Curtmola,et al.  Provable data possession at untrusted stores , 2007, CCS '07.

[12]  Mark Lillibridge,et al.  Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.

[13]  Neal Koblitz,et al.  Advances in Cryptology — CRYPTO ’96 , 2001, Lecture Notes in Computer Science.

[14]  Shirley M. Radack,et al.  Secure Hash Standard: Updated Specifications Approved and Issued as Federal Information Processing Standard (FIPS) 180-4 | NIST , 2012 .

[15]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[16]  Hugo Krawczyk,et al.  Keying Hash Functions for Message Authentication , 1996, CRYPTO.

[17]  Quynh H. Dang,et al.  Secure Hash Standard | NIST , 2015 .

[18]  George Forman,et al.  Finding similar files in large document repositories , 2005, KDD '05.