Distributed Key Generation for Encrypted Deduplication: Achieving the Strongest Privacy

Large-scale cloud storage systems often attempt to achieve two seemingly conflicting goals: (1) the systems need to reduce the copies of redundant data to save space, a process called deduplication; and (2) users demand encryption of their data to ensure privacy. Conventional encryption makes deduplication on ciphertexts ineffective, as it destroys data redundancy. A line of work, originated from Convergent Encryption [27], and evolved into Message Locked Encryption [13] and the latest DupLESS architecture [12], strives to solve this problem. DupLESS relies on a key server to help the clients generate encryption keys that result in convergent ciphertexts. In this paper, we first introduce a new security notion appropriate for the setting of deduplication and show that it is strictly stronger than all relevant notions. We then provide a rigorous proof of security against this notion, in the random oracle model, for the DupLESS architecture which is lacking in the original paper. Our proof shows that using additional secret, other than the data itself, for generating encryption keys achieves the best possible security under current deduplication paradigm. We also introduce a distributed protocol that eliminates the need for the key server. This not only provides better protection but also allows less managed systems such as P2P systems to enjoy the high security level. Implementation and evaluation show that the scheme is both robust and practical.

[1]  David Chaum,et al.  Blind Signatures for Untraceable Payments , 1982, CRYPTO.

[2]  Susan K. Langford Threshold DSS Signatures without a Trusted Party , 1995, CRYPTO.

[3]  Brian Warner,et al.  Tahoe: the least-authority filesystem , 2008, StorageSS '08.

[4]  Yafei Dai,et al.  PeerDedupe: Insights into the Peer-Assisted Sampling Deduplication , 2010, 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P).

[5]  Serge Fehr,et al.  On Notions of Security for Deterministic Encryption, and Efficient Constructions without Random Oracles , 2008, CRYPTO.

[6]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[7]  Mihir Bellare,et al.  A concrete security treatment of symmetric encryption , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[8]  Mihir Bellare,et al.  Message-Locked Encryption and Secure Deduplication , 2013, EUROCRYPT.

[9]  Cynthia Dwork,et al.  An Ad Omnia Approach to Defining and Achieving Private Data Analysis , 2007, PinKDD.

[10]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[11]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[12]  Ivan Damgård,et al.  Efficient, Robust and Constant-Round Distributed RSA Key Generation , 2010, TCC.

[13]  Gultekin Özsoyoglu,et al.  Auditing for secure statistical databases , 1981, ACM '81.

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[16]  Yitao Duan,et al.  P4P: Practical Large-Scale Privacy-Preserving Distributed Computation Robust against Malicious Users , 2010, USENIX Security Symposium.

[17]  Ronald Cramer,et al.  A Practical Public Key Cryptosystem Provably Secure Against Adaptive Chosen Ciphertext Attack , 1998, CRYPTO.

[18]  Yitao Duan,et al.  Protecting User Data in Ubiquitous Computing: Towards Trustworthy Environments , 2004, Privacy Enhancing Technologies.

[19]  Marvin Theimer,et al.  Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.

[20]  Silvio Micali,et al.  A Digital Signature Scheme Secure Against Adaptive Chosen-Message Attacks , 1988, SIAM J. Comput..

[21]  Yitao Duan,et al.  How to Construct Multicast Cryptosystems Provably Secure Against Adaptive Chosen Ciphertext Attack , 2006, CT-RSA.

[22]  Moti Yung,et al.  Robust efficient distributed RSA-key generation , 1998, STOC '98.

[23]  Yang Zhang,et al.  Liquid: A Scalable Deduplication File System for Virtual Machine Images , 2014, IEEE Transactions on Parallel and Distributed Systems.

[24]  Mihir Bellare,et al.  DupLESS: Server-Aided Encryption for Deduplicated Storage , 2013, USENIX Security Symposium.

[25]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[26]  Christian Grothoff,et al.  Efficient Sharing of Encrypted Data , 2002, ACISP.

[27]  Irfan Ahmad,et al.  Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.

[28]  David Pointcheval,et al.  (Semantic Security and Pseudo-Random Permutations) , 2004 .

[29]  Mihir Bellare,et al.  OCB: a block-cipher mode of operation for efficient authenticated encryption , 2001, CCS '01.

[30]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[31]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[32]  Adam O'Neill,et al.  Deterministic Encryption: Definitional Equivalences and Constructions without Random Oracles , 2008, CRYPTO.

[33]  Mihir Bellare,et al.  Deterministic and Efficiently Searchable Encryption , 2007, CRYPTO.

[34]  Silvio Micali,et al.  Probabilistic encryption & how to play mental poker keeping secret all partial information , 1982, STOC '82.

[35]  Ivan Damgård,et al.  Practical Threshold RSA Signatures without a Trusted Dealer , 2000, EUROCRYPT.

[36]  Jacques Stern,et al.  Fully Distributed Threshold RSA under Standard Assumptions , 2001, ASIACRYPT.

[37]  Silvio Micali,et al.  Probabilistic Encryption , 1984, J. Comput. Syst. Sci..

[38]  Matthew K. Franklin,et al.  Efficient generation of shared RSA keys , 2001, JACM.

[39]  Yitao Duan Privacy without noise , 2009, CIKM.

[40]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[41]  David Pointcheval,et al.  About the Security of Ciphers (Semantic Security and Pseudo-Random Permutations) , 2004, Selected Areas in Cryptography.

[42]  Victor Shoup,et al.  Practical Threshold Signatures , 2000, EUROCRYPT.

[43]  Silvio Micali,et al.  The knowledge complexity of interactive proof-systems , 1985, STOC '85.

[44]  Darrell D. E. Long,et al.  Strong Security for Network-Attached Storage , 2002, FAST.

[45]  Le Zhang,et al.  Fast and Secure Laptop Backups with Encrypted De-duplication , 2010, LISA.