Transparent Data Deduplication in the Cloud

Cloud storage providers such as Dropbox and Google drive heavily rely on data deduplication to save storage costs by only storing one copy of each uploaded file. Although recent studies report that whole file deduplication can achieve up to 50% storage reduction, users do not directly benefit from these savings-as there is no transparent relation between effective storage costs and the prices offered to the users. In this paper, we propose a novel storage solution, ClearBox, which allows a storage service provider to transparently attest to its customers the deduplication patterns of the (encrypted) data that it is storing. By doing so, ClearBox enables cloud users to verify the effective storage space that their data is occupying in the cloud, and consequently to check whether they qualify for benefits such as price reductions, etc. ClearBox is secure against malicious users and a rational storage provider, and ensures that files can only be accessed by their legitimate owners. We evaluate a prototype implementation of ClearBox using both Amazon S3 and Dropbox as back-end cloud storage. Our findings show that our solution works with the APIs provided by existing service providers without any modifications and achieves comparable performance to existing solutions.

[1]  D. Boneh,et al.  Short Signatures from the Weil Pairing , 2001, Journal of Cryptology.

[2]  Amos Fiat,et al.  How to Prove Yourself: Practical Solutions to Identification and Signature Problems , 1986, CRYPTO.

[3]  Silvio Micali,et al.  Zero-knowledge sets , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[4]  Hovav Shacham,et al.  Compact Proofs of Retrievability , 2008, Journal of Cryptology.

[5]  Peeter Laud,et al.  Eliminating Counterevidence with Applications to Accountable Certificate Management , 2002, J. Comput. Secur..

[6]  Ninghui Li,et al.  Universal Accumulators with Efficient Nonmembership Proofs , 2007, ACNS.

[7]  Ghassan O. Karame,et al.  PoWerStore: proofs of writing for efficient and robust storage , 2012, CCS.

[8]  Ian Goldberg,et al.  Constant-Size Commitments to Polynomials and Their Applications , 2010, ASIACRYPT.

[9]  Ivan Damgård,et al.  Supporting Non-membership Proofs with Bilinear-map Accumulators , 2008, IACR Cryptol. ePrint Arch..

[10]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[11]  Jia Xu,et al.  Weak leakage-resilient client-side deduplication of encrypted data in cloud storage , 2013, ASIA CCS '13.

[12]  Benny Pinkas,et al.  Side Channels in Cloud Services: Deduplication in Cloud Storage , 2010, IEEE Security & Privacy.

[13]  Guangwen Yang,et al.  Understanding Data Characteristics and Access Patterns in a Cloud Storage System , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[14]  Ghassan O. Karame,et al.  Outsourced Proofs of Retrievability , 2014, CCS.

[15]  Mihir Bellare,et al.  DupLESS: Server-Aided Encryption for Deduplicated Storage , 2013, USENIX Security Symposium.

[16]  Lan Nguyen,et al.  Accumulators from Bilinear Pairings and Applications , 2005, CT-RSA.

[17]  Alessandro Sorniotti,et al.  A Secure Data Deduplication Scheme for Cloud Storage , 2014, Financial Cryptography.

[18]  Alexandra Boldyreva,et al.  Efficient threshold signature, multisignature and blind signature schemes based on the Gap-Diffie-Hellman-Group signature scheme , 2002 .

[19]  Benny Pinkas,et al.  Proofs of ownership in remote storage systems , 2011, CCS '11.

[20]  Jan Camenisch,et al.  Dynamic Accumulators and Application to Efficient Revocation of Anonymous Credentials , 2002, CRYPTO.

[21]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[22]  Hovav Shacham,et al.  Compact Proofs of Retrievability , 2008, ASIACRYPT.

[23]  Birgit Pfitzmann,et al.  Collision-Free Accumulators and Fail-Stop Signature Schemes Without Trees , 1997, EUROCRYPT.

[24]  Mihir Bellare,et al.  Message-Locked Encryption and Secure Deduplication , 2013, EUROCRYPT.

[25]  Reza Curtmola,et al.  Provable data possession at untrusted stores , 2007, CCS '07.

[26]  Helger Lipmaa,et al.  Secure Accumulators from Euclidean Rings without Trusted Setup , 2012, ACNS.

[27]  Mihir Bellare,et al.  Interactive Message-Locked Encryption and Secure Deduplication , 2015, Public Key Cryptography.

[28]  Alexandra Boldyreva,et al.  Efficient threshold signature , multisignature and blind signature schemes based on the Gap-Diffie-Hellman-group signature scheme , 2002 .

[29]  Hovav Shacham,et al.  Short Signatures from the Weil Pairing , 2001, J. Cryptol..

[30]  Roberto Di Pietro,et al.  Boosting efficiency and security in proof of ownership for deduplication , 2012, ASIACCS '12.

[31]  Hubert Ritzdorf,et al.  Commune: Shared Ownership in an Agnostic Cloud , 2015, SACMAT.

[32]  Ronald L. Rivest,et al.  Hourglass schemes: how to prove that cloud files are encrypted , 2012, CCS.

[33]  Ghassan O. Karame,et al.  Double-spending fast payments in bitcoin , 2012, CCS.

[34]  Roberto Di Pietro,et al.  A tunable proof of ownership scheme for deduplication using Bloom filters , 2014, 2014 IEEE Conference on Communications and Network Security.