Proof of ownership for deduplication systems: A secure, scalable, and efficient solution

Abstract Deduplication is a technique used to reduce the amount of storage needed by service providers. It is based on the intuition that several users may want (for different reasons) to store the same content. Hence, storing a single copy of these files would be sufficient. Albeit simple in theory, the implementation of this concept introduces many security risks. In this paper, we address the most severe one: an adversary, possessing only a fraction of the original file, or colluding with a rightful owner who leaks arbitrary portions of it, becomes able to claim possession of the entire file. The paper’s contributions are manifold: first, we review the security issues introduced by deduplication, and model related security threats; second, we introduce a novel Proof of Ownership (POW) scheme with all the features of the state-of-the-art solution and only a fraction of its overhead. We also show that the security of the proposed mechanisms relies on information-theoretical rather than computational assumptions, and propose viable optimization techniques that further improve the scheme’s performance. Finally, the quality of our proposal is supported by extensive benchmarking.

[1]  Darrell D. E. Long,et al.  Secure data deduplication , 2008, StorageSS '08.

[2]  Reza Curtmola,et al.  Remote data checking using provable data possession , 2011, TSEC.

[3]  Reza Curtmola,et al.  Provable data possession at untrusted stores , 2007, CCS '07.

[4]  Makoto Matsumoto,et al.  SIMD-Oriented Fast Mersenne Twister: a 128-bit Pseudorandom Number Generator , 2008 .

[5]  Roberto Di Pietro,et al.  Boosting efficiency and security in proof of ownership for deduplication , 2012, ASIACCS '12.

[6]  Benny Pinkas,et al.  Side Channels in Cloud Services: Deduplication in Cloud Storage , 2010, IEEE Security & Privacy.

[7]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[8]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[9]  André Brinkmann,et al.  Multi-level comparison of data deduplication in a backup scenario , 2009, SYSTOR '09.

[10]  Yonggang Wen,et al.  Private data deduplication protocols in cloud storage , 2012, SAC '12.

[11]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[12]  Ari Juels,et al.  Pors: proofs of retrievability for large files , 2007, CCS '07.

[13]  R. Lewand Cryptological Mathematics , 2000 .

[14]  Pin Zhou,et al.  Demystifying data deduplication , 2008, Companion '08.

[15]  Mihir Bellare,et al.  Message-Locked Encryption and Secure Deduplication , 2013, EUROCRYPT.

[16]  Shouhuai Xu,et al.  Secure and efficient proof of storage with deduplication , 2012, CODASPY '12.

[17]  Jia Xu,et al.  Weak leakage-resilient client-side deduplication of encrypted data in cloud storage , 2013, ASIA CCS '13.

[18]  Benny Pinkas,et al.  Proofs of ownership in remote storage systems , 2011, CCS '11.

[19]  Shmuel Tomi Klein,et al.  The design of a similarity based deduplication system , 2009, SYSTOR '09.

[20]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[21]  Dalit Naor,et al.  Estimation of deduplication ratios in large data sets , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[22]  Roberto Di Pietro,et al.  Scalable and efficient provable data possession , 2008, IACR Cryptol. ePrint Arch..