Enabling Data Integrity Protection in Regenerating-Coding-Based Cloud Storage: Theory and Implementation

To protect outsourced data in cloud storage against corruptions, adding fault tolerance to cloud storage, along with efficient data integrity checking and recovery procedures, becomes critical. Regenerating codes provide fault tolerance by striping data across multiple servers, while using less repair traffic than traditional erasure codes during failure recovery. Therefore, we study the problem of remotely checking the integrity of regenerating-coded data against corruptions under a real-life cloud storage setting. We design and implement a practical data integrity protection (DIP) scheme for a specific regenerating code, while preserving its intrinsic properties of fault tolerance and repair-traffic saving. Our DIP scheme is designed under a mobile Byzantine adversarial model, and enables a client to feasibly verify the integrity of random subsets of outsourced data against general or malicious corruptions. It works under the simple assumption of thin-cloud storage and allows different parameters to be fine-tuned for a performance-security trade-off. We implement and evaluate the overhead of our DIP scheme in a real cloud storage testbed under different parameter choices. We further analyze the security strengths of our DIP scheme via mathematical models. We demonstrate that remote integrity checking can be feasibly integrated into regenerating codes in practical deployment.

[1]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[2]  Oded Goldreich,et al.  Foundations of Cryptography: List of Figures , 2001 .

[3]  Yang Tang,et al.  NCCloud: applying network coding for the storage repair in a cloud-of-clouds , 2012, FAST.

[4]  Reza Curtmola,et al.  MR-PDP: Multiple-Replica Provable Data Possession , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[5]  Reza Curtmola,et al.  Remote data checking for network coding-based distributed storage systems , 2010, CCSW '10.

[6]  Moni Naor,et al.  The complexity of online memory checking , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[7]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[8]  Ari Juels,et al.  HAIL: a high-availability and integrity layer for cloud storage , 2009, CCS.

[9]  Roberto Di Pietro,et al.  Scalable and efficient provable data possession , 2008, IACR Cryptol. ePrint Arch..

[10]  Manuel Blum,et al.  Checking the correctness of memories , 2005, Algorithmica.

[11]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[12]  Helen J. Wang,et al.  Enabling Security in Cloud Storage SLAs with CloudProof , 2011, USENIX ATC.

[13]  Miguel Correia,et al.  DepSky: Dependable and Secure Storage in a Cloud-of-Clouds , 2013, TOS.

[14]  Ethan L. Miller,et al.  Store, Forget, and Check: Using Algebraic Signatures to Check Remotely Administered Storage , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[15]  Ari Juels,et al.  Pors: proofs of retrievability for large files , 2007, CCS '07.

[16]  Cong Wang,et al.  Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing , 2010, 2010 Proceedings IEEE INFOCOM.

[17]  James S. Plank,et al.  A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..

[18]  Patrick P. C. Lee,et al.  Enabling Data Integrity Protection in Regenerating-Coding-Based Cloud Storage , 2012, SRDS.

[19]  Frédérique E. Oggier,et al.  Byzantine fault tolerance of regenerating codes , 2011, 2011 IEEE International Conference on Peer-to-Peer Computing.

[20]  Frédérique E. Oggier,et al.  RapidRAID: Pipelined erasure codes for fast data archival in distributed storage systems , 2013, 2013 Proceedings IEEE INFOCOM.

[21]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[22]  Richard E. Overill,et al.  Foundations of Cryptography: Basic Tools , 2002, J. Log. Comput..

[23]  Bianca Schroeder,et al.  Understanding latent sector errors and how to protect against them , 2010, TOS.

[24]  Hakim Weatherspoon,et al.  RACS: a case for cloud storage diversity , 2010, SoCC '10.

[25]  Jonathan Katz,et al.  Proofs of Storage from Homomorphic Identification Protocols , 2009, ASIACRYPT.

[26]  Reza Curtmola,et al.  Robust remote data checking , 2008, StorageSS '08.

[27]  Gail-Joon Ahn,et al.  Cooperative Provable Data Possession for Integrity Verification in Multicloud Storage , 2012, IEEE Transactions on Parallel and Distributed Systems.

[28]  Hugo Krawczyk,et al.  Cryptographic Extraction and Key Derivation: The HKDF Scheme , 2010, IACR Cryptol. ePrint Arch..

[29]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[30]  Lidong Chen,et al.  Recommendation for Key Derivation Using Pseudorandom Functions (Revised) , 2009 .

[31]  John Black,et al.  Ciphers with Arbitrary Finite Domains , 2002, CT-RSA.

[32]  Darrell D. E. Long,et al.  Protecting against rare event failures in archival systems , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[33]  Oded Goldreich Foundations of Cryptography: Index , 2001 .

[34]  Hovav Shacham,et al.  Compact Proofs of Retrievability , 2008, Journal of Cryptology.

[35]  Ari Juels,et al.  Proofs of retrievability: theory and implementation , 2009, CCSW '09.

[36]  Yevgeniy Dodis,et al.  Proofs of Retrievability via Hardness Amplification , 2009, IACR Cryptol. ePrint Arch..

[37]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[38]  Reza Curtmola,et al.  Remote data checking using provable data possession , 2011, TSEC.

[39]  Michael Vrable,et al.  Cumulus: Filesystem backup to the cloud , 2009, TOS.