On Information Leakage in Deduplicated Storage Systems

Most existing cloud storage providers rely on data deduplication in order to significantly save storage costs by storing duplicate data only once. While the literature has thoroughly analyzed client-side information leakage associated with the use of data deduplication techniques in the cloud, no previous work has analyzed the information leakage associated with access trace information information (e.g., object size and timing) that are available whenever a client uploads a file to a curious cloud provider. In this paper, we address this problem and analyze information leakage associated with data deduplication on a curious storage server. We show that even if the data is encrypted using a key not known by the storage server, the latter can still acquire considerable information about the stored files and even determine which files are stored. We validate our results both analytically and experimentally using a number of real storage datasets.

[1]  Benny Pinkas,et al.  Proofs of ownership in remote storage systems , 2011, CCS '11.

[2]  Daniel Zelterman,et al.  Bayesian Artificial Intelligence , 2005, Technometrics.

[3]  Roberto Di Pietro,et al.  Boosting efficiency and security in proof of ownership for deduplication , 2012, ASIACCS '12.

[4]  Kave Eshghi,et al.  A Framework for Analyzing and Improving Content-Based Chunking Algorithms , 2005 .

[5]  Darrell D. E. Long,et al.  Secure data deduplication , 2008, StorageSS '08.

[6]  Mihir Bellare,et al.  Message-Locked Encryption and Secure Deduplication , 2013, EUROCRYPT.

[7]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[8]  André Brinkmann,et al.  Multi-level comparison of data deduplication in a backup scenario , 2009, SYSTOR '09.

[9]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[10]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[11]  Alessandro Sorniotti,et al.  A Secure Data Deduplication Scheme for Cloud Storage , 2014, Financial Cryptography.

[12]  João Paulo,et al.  A Survey and Classification of Storage Deduplication Systems , 2014, ACM Comput. Surv..

[13]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[14]  Mihir Bellare,et al.  DupLESS: Server-Aided Encryption for Deduplicated Storage , 2013, USENIX Security Symposium.

[15]  Refik Molva,et al.  PerfectDedup: Secure Data Deduplication , 2015, DPM/QASA@ESORICS.

[16]  Riivo Talviste,et al.  From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting , 2013, ACNS.

[17]  T. Mckinley Why PDF is everywhere: How Adobe's Portable Document Format evolved and how it can facilitate document management , 1997 .

[18]  Benny Pinkas,et al.  Side Channels in Cloud Services: Deduplication in Cloud Storage , 2010, IEEE Security & Privacy.

[19]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence, Second Edition , 2010 .

[20]  Ghassan O. Karame,et al.  Transparent Data Deduplication in the Cloud , 2015, CCS.

[21]  Refik Molva,et al.  Block-level De-duplication with Encrypted Data , 2014, Open J. Cloud Comput..