A Survey of Secure Data Deduplication Schemes for Cloud Storage Systems

Data deduplication has attracted many cloud service providers (CSPs) as a way to reduce storage costs. Even though the general deduplication approach has been increasingly accepted, it comes with many security and privacy problems due to the outsourced data delivery models of cloud storage. To deal with specific security and privacy issues, secure deduplication techniques have been proposed for cloud data, leading to a diverse range of solutions and trade-offs. Hence, in this article, we discuss ongoing research on secure deduplication for cloud data in consideration of the attack scenarios exploited most widely in cloud storage. On the basis of classification of deduplication system, we explore security risks and attack scenarios from both inside and outside adversaries. We then describe state-of-the-art secure deduplication techniques for each approach that deal with different security issues under specific or combined threat models, which include both cryptographic and protocol solutions. We discuss and compare each scheme in terms of security and efficiency specific to different security goals. Finally, we identify and discuss unresolved issues and further research challenges for secure deduplication in cloud storage.

[1]  Anwitaman Datta,et al.  InterCloud RAIDer: A Do-It-Yourself Multi-cloud Private Data Backup System , 2014, ICDCN.

[2]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[3]  Martín Abadi,et al.  Message-Locked Encryption for Lock-Dependent Messages , 2013, IACR Cryptol. ePrint Arch..

[4]  Yitao Duan,et al.  Distributed Key Generation for Encrypted Deduplication: Achieving the Strongest Privacy , 2014, CCSW.

[5]  Alessandro Sorniotti,et al.  A Secure Data Deduplication Scheme for Cloud Storage , 2014, Financial Cryptography.

[6]  João Paulo,et al.  A Survey and Classification of Storage Deduplication Systems , 2014, ACM Comput. Surv..

[7]  Darrell D. E. Long,et al.  Secure data deduplication , 2008, StorageSS '08.

[8]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[9]  Jin Li,et al.  Secure Deduplication with Efficient and Reliable Convergent Key Management , 2014, IEEE Transactions on Parallel and Distributed Systems.

[10]  Mihir Bellare,et al.  Message-Locked Encryption and Secure Deduplication , 2013, EUROCRYPT.

[11]  Catherine A. Meadows,et al.  Security of Ramp Schemes , 1985, CRYPTO.

[12]  Benny Pinkas,et al.  Secure Deduplication of Encrypted Data without Additional Independent Servers , 2015, CCS.

[13]  Suresh Jagannathan,et al.  Improving duplicate elimination in storage systems , 2006, TOS.

[14]  Cong Wang,et al.  Enabling Encrypted Cloud Media Center with Secure Deduplication , 2015, AsiaCCS.

[15]  Jia Xu,et al.  Weak leakage-resilient client-side deduplication of encrypted data in cloud storage , 2013, ASIA CCS '13.

[16]  Benny Pinkas,et al.  Side Channels in Cloud Services: Deduplication in Cloud Storage , 2010, IEEE Security & Privacy.

[17]  Kwangjo Kim,et al.  Security weakness in the Proof of Storage with Deduplication , 2012, IACR Cryptol. ePrint Arch..

[18]  Reza Curtmola,et al.  Provable data possession at untrusted stores , 2007, CCS '07.

[19]  Westone,et al.  Home Page , 2004, 2022 2nd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA).

[20]  Patrick Fay,et al.  Breakthrough AES Performance with Intel ® AES New Instructions , 2010 .

[21]  Kwangjo Kim,et al.  Differentially private client-side data deduplication protocol for cloud storage services , 2015, Secur. Commun. Networks.

[22]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[23]  Nesrine Kaaniche,et al.  A Secure Client Side Deduplication Scheme in Cloud Storage Environments , 2014, 2014 6th International Conference on New Technologies, Mobility and Security (NTMS).

[24]  James S. Plank,et al.  AONT-RS: Blending Security and Performance in Dispersed Storage Systems , 2011, FAST.

[25]  Andrei Z. Broder,et al.  Identifying and Filtering Near-Duplicate Documents , 2000, CPM.

[26]  Ethan L. Miller,et al.  The effectiveness of deduplication on virtual machine disk images , 2009, SYSTOR '09.

[27]  Ha T. Lam,et al.  Encryption Performance Improvements of the Paillier Cryptosystem , 2015, IACR Cryptol. ePrint Arch..

[28]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[29]  Ari Juels,et al.  Pors: proofs of retrievability for large files , 2007, CCS '07.

[30]  Ralph C. Merkle,et al.  A Certified Digital Signature , 1989, CRYPTO.

[31]  Michal Kaczmarczyk,et al.  Reducing impact of data fragmentation caused by in-line deduplication , 2012, SYSTOR '12.

[32]  Benny Pinkas,et al.  Proofs of ownership in remote storage systems , 2011, CCS '11.

[33]  Edgar R. Weippl,et al.  Dark Clouds on the Horizon: Using Cloud Storage as Attack Vector and Online Slack Space , 2011, USENIX Security Symposium.

[34]  Mingqiang Li,et al.  CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal , 2015, IEEE Internet Computing.

[35]  Yucheng Zhang,et al.  SecDep: A user-aware efficient fine-grained secure deduplication scheme with multi-level key management , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[36]  Torben P. Pedersen Non-Interactive and Information-Theoretic Secure Verifiable Secret Sharing , 1991, CRYPTO.

[37]  Chao Yang,et al.  Provable Ownership of Encrypted Files in De-duplication Cloud Storage , 2015, Ad Hoc Sens. Wirel. Networks.

[38]  Jing Li,et al.  BDO-SD: An efficient scheme for big data outsourcing with secure deduplication , 2015, 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[39]  Harikesh Pandey,et al.  Secure and Constant Cost Public Cloud Storage Auditing with Deduplication , 2017 .

[40]  Yucheng Zhang,et al.  Design Tradeoffs for Data Deduplication Performance in Backup Workloads , 2015, FAST.

[41]  Dalit Naor,et al.  Estimation of deduplication ratios in large data sets , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[42]  Lorena González-Manzano,et al.  An efficient confidentiality-preserving Proof of Ownership for deduplication , 2015, J. Netw. Comput. Appl..

[43]  Xiaofeng Chen,et al.  Secure Distributed Deduplication Systems with Improved Reliability , 2015, IEEE Trans. Computers.

[44]  Yonggang Wen,et al.  Private data deduplication protocols in cloud storage , 2012, SAC '12.

[45]  Taher El Gamal A public key cryptosystem and a signature scheme based on discrete logarithms , 1984, IEEE Trans. Inf. Theory.

[46]  Stefan Dziembowski,et al.  Intrusion-Resilience Via the Bounded-Storage Model , 2006, TCC.

[47]  Shouhuai Xu,et al.  Secure and efficient proof of storage with deduplication , 2012, CODASPY '12.

[48]  Chao Yang,et al.  Provable ownership of file in de-duplication cloud storage , 2013, 2013 IEEE Global Communications Conference (GLOBECOM).

[49]  Chao Yang,et al.  Provable ownership of files in deduplication cloud storage , 2015, Secur. Commun. Networks.

[50]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[51]  Mihir Bellare,et al.  DupLESS: Server-Aided Encryption for Deduplicated Storage , 2013, USENIX Security Symposium.

[52]  Mihir Bellare,et al.  Interactive Message-Locked Encryption and Secure Deduplication , 2015, Public Key Cryptography.

[53]  Cui Dong,et al.  A Policy-based De-duplication Mechanism for Encrypted Cloud Storage , 2015 .

[54]  Bernd Freisleben,et al.  Why eve and mallory love android: an analysis of android SSL (in)security , 2012, CCS.

[55]  Pin Zhou,et al.  Demystifying data deduplication , 2008, Companion '08.

[56]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[57]  Dooho Choi,et al.  Privacy-preserving cross-user source-based data deduplication in cloud storage , 2012, 2012 International Conference on ICT Convergence (ICTC).

[58]  Hakim Weatherspoon,et al.  RACS: a case for cloud storage diversity , 2010, SoCC '10.

[59]  Tobias Pulls (More) Side Channels in Cloud Storage - Linking Data to Users , 2011, PrimeLife.

[60]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[61]  Christoph Neumann,et al.  Improving the Resistance to Side-Channel Attacks on Cloud Storage Services , 2012, 2012 5th International Conference on New Technologies, Mobility and Security (NTMS).

[62]  Hui Li,et al.  Secure multi-server-aided data deduplication in cloud computing , 2015, Pervasive Mob. Comput..

[63]  Refik Molva,et al.  ClouDedup: Secure Deduplication with Encrypted Data for Cloud Storage , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[64]  Jin Li,et al.  Convergent Dispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds , 2014, HotCloud.

[65]  Roberto Di Pietro,et al.  A tunable proof of ownership scheme for deduplication using Bloom filters , 2014, 2014 IEEE Conference on Communications and Network Security.

[66]  Hyunsoo Yoon,et al.  Secure and Efficient Deduplication over Encrypted Data with Dynamic Updates in Cloud Storage , 2014, FCC.

[67]  Roberto Di Pietro,et al.  Boosting efficiency and security in proof of ownership for deduplication , 2012, ASIACCS '12.

[68]  Jin Li,et al.  A Hybrid Cloud Approach for Secure Authorized Deduplication , 2015, IEEE Transactions on Parallel and Distributed Systems.