SecDep: A user-aware efficient fine-grained secure deduplication scheme with multi-level key management

Nowadays, many customers and enterprises backup their data to cloud storage that performs deduplication to save storage space and network bandwidth. Hence, how to perform secure deduplication becomes a critical challenge for cloud storage. According to our analysis, the state-of-the-art secure deduplication methods are not suitable for cross-user finegrained data deduplication. They either suffer brute-force attacks that can recover files falling into a known set, or incur large computation (time) overheads. Moreover, existing approaches of convergent key management incur large space overheads because of the huge number of chunks shared among users. Our observation that cross-user redundant data are mainly from the duplicate files, motivates us to propose an efficient secure deduplication scheme SecDep. SecDep employs User-Aware Convergent Encryption (UACE) and Multi-Level Key management (MLK) approaches. (1) UACE combines cross-user file-level and inside-user chunk-level deduplication, and exploits different secure policies among and inside users to minimize the computation overheads. Specifically, both of file-level and chunk-level deduplication use variants of Convergent Encryption (CE) to resist brute-force attacks. The major difference is that the file-level CE keys are generated by using a server-aided method to ensure security of cross-user deduplication, while the chunk-level keys are generated by using a user-aided method with lower computation overheads. (2) To reduce key space overheads, MLK uses file-level key to encrypt chunk-level keys so that the key space will not increase with the number of sharing users. Furthermore, MLK splits the file-level keys into share-level keys and distributes them to multiple key servers to ensure security and reliability of file-level keys. Our security analysis demonstrates that SecDep ensures data confidentiality and key security. Our experiment results based on several large real-world datasets show that SecDep is more time-efficient and key-space-efficient than the state-of-the-art secure deduplication approaches.

[1]  Brian D. Noble,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[2]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[3]  Yafei Dai,et al.  PeerDedupe: Insights into the Peer-Assisted Sampling Deduplication , 2010, 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P).

[4]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[5]  Fred Douglis,et al.  Characteristics of backup workloads in production systems , 2012, FAST.

[6]  Jia Xu,et al.  Weak leakage-resilient client-side deduplication of encrypted data in cloud storage , 2013, ASIA CCS '13.

[7]  Ian Pratt,et al.  Proceedings of the General Track: 2004 USENIX Annual Technical Conference , 2004 .

[8]  Abhi Shelat,et al.  Simulatable Adaptive Oblivious Transfer , 2007, EUROCRYPT.

[9]  Jin Li,et al.  Secure Deduplication with Efficient and Reliable Convergent Key Management , 2014, IEEE Transactions on Parallel and Distributed Systems.

[10]  Benny Pinkas,et al.  Proofs of ownership in remote storage systems , 2011, CCS '11.

[11]  Roberto Di Pietro,et al.  Boosting efficiency and security in proof of ownership for deduplication , 2012, ASIACCS '12.

[12]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[13]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[14]  Refik Molva,et al.  ClouDedup: Secure Deduplication with Encrypted Data for Cloud Storage , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[15]  Yonggang Wen,et al.  Private data deduplication protocols in cloud storage , 2012, SAC '12.

[16]  Moni Naor,et al.  Number-theoretic constructions of efficient pseudo-random functions , 2004, JACM.

[17]  Edgar R. Weippl,et al.  Dark Clouds on the Horizon: Using Cloud Storage as Attack Vector and Online Slack Space , 2011, USENIX Security Symposium.

[18]  Sudipta Sengupta,et al.  Primary Data Deduplication - Large Scale Study and System Design , 2012, USENIX Annual Technical Conference.

[19]  Hong Jiang,et al.  SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup , 2010, 2010 39th International Conference on Parallel Processing.

[20]  Alessandro Sorniotti,et al.  A Secure Data Deduplication Scheme for Cloud Storage , 2014, Financial Cryptography.

[21]  Brian Warner,et al.  Tahoe: the least-authority filesystem , 2008, StorageSS '08.

[22]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[23]  Mihir Bellare,et al.  DupLESS: Server-Aided Encryption for Deduplicated Storage , 2013, USENIX Security Symposium.

[24]  Alfredo De Santis,et al.  Multiple ramp schemes , 1999, IEEE Trans. Inf. Theory.

[25]  Jin Li,et al.  Convergent Dispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds , 2014, HotCloud.

[26]  Erez Zadok,et al.  Generating Realistic Datasets for Deduplication Analysis , 2012, USENIX Annual Technical Conference.

[27]  Cezary Dubnicki,et al.  Concurrent deletion in a distributed content-addressable storage system with global deduplication , 2013, FAST.

[28]  Darrell D. E. Long,et al.  Secure data deduplication , 2008, StorageSS '08.

[29]  Mihir Bellare,et al.  Message-Locked Encryption and Secure Deduplication , 2013, EUROCRYPT.

[30]  Catherine A. Meadows,et al.  Security of Ramp Schemes , 1985, CRYPTO.

[31]  Hong Jiang,et al.  SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme with Low RAM Overhead and High Throughput , 2011, USENIX Annual Technical Conference.

[32]  Dan Feng,et al.  Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information , 2014, USENIX Annual Technical Conference.

[33]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[34]  Hong Jiang,et al.  Similarity and Locality Based Indexing for High Performance Data Deduplication , 2015, IEEE Transactions on Computers.

[35]  David Chaum,et al.  Blind Signatures for Untraceable Payments , 1982, CRYPTO.