Secure data deduplication

As the world moves to digital storage for archival purposes, there is an increasing demand for systems that can provide secure data storage in a cost-effective manner. By identifying common chunks of data both within and between files and storing them only once, deduplication can yield cost savings by increasing the utility of a given amount of storage. Unfortunately, deduplication exploits identical content, while encryption attempts to make all content appear random; the same content encrypted with two different keys results in very different ciphertext. Thus, combining the space efficiency of deduplication with the secrecy aspects of encryption is problematic. We have developed a solution that provides both data security and space efficiency in single-server storage and distributed storage systems. Encryption keys are generated in a consistent manner from the chunk data; thus, identical chunks will always encrypt to the same ciphertext. Furthermore, the keys cannot be deduced from the encrypted chunk data. Since the information each user needs to access and decrypt the chunks that make up a file is encrypted using a key known only to the user, even a full compromise of the system cannot reveal which chunks are used by which users.

[1]  Darrell D. E. Long,et al.  Providing High Reliability in a Minimum Redundancy Archival Storage System , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[2]  Michael K. Reiter,et al.  Efficient Byzantine-tolerant erasure-coded storage , 2004, International Conference on Dependable Systems and Networks, 2004.

[3]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[4]  Siddhartha Annapureddy,et al.  Shark: scaling file servers via cooperative caching , 2005, NSDI.

[5]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[6]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[7]  Paul F. Syverson,et al.  Onion routing , 1999, CACM.

[8]  Ben Y. Zhao,et al.  Awarded Best Student Paper! - Pond: The OceanStore Prototype , 2003 .

[9]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[10]  William J. Bolosky,et al.  Single instance storage in Windows® 2000 , 2000 .

[11]  Robert S. Cahn,et al.  Design and Implementation of a Secure Distributed Data Repository , 1998 .

[12]  Lorrie Faith Cranor,et al.  Publius: A Robust, Tamper-Evident, Censorship-Resistant, and Source-Anonymous Web Publishing System , 2000, USENIX Security Symposium.

[13]  Nicole M. Follansbee Implications of the Health Information Portability and Accountability Act , 2002, The Journal of nursing administration.

[14]  William J. Bolosky,et al.  Single Instance Storage in Windows , 2000 .

[15]  Darrell D. E. Long,et al.  Strong Security for Network-Attached Storage , 2002, FAST.

[16]  Friedhelm Meyer auf der Heide,et al.  Dynamic and Redundant Data Placement , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[17]  Qian Wang,et al.  Plutus: Scalable Secure File Sharing on Untrusted Storage , 2003, FAST.

[18]  Dennis Shasha,et al.  Secure Untrusted Data Repository (SUNDR) , 2004, OSDI.

[19]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[20]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[21]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[22]  Ethan L. Miller,et al.  Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[23]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[24]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[25]  Fred Douglis,et al.  USENIX Association Proceedings of the General Track : 2003 USENIX Annual , 2003 .

[26]  Timothy Roscoe,et al.  Mnemosyne: Peer-to-Peer Steganographic Storage , 2002, IPTPS.

[27]  Ethan L. Miller,et al.  POTSHARDS: Secure Long-Term Storage Without Encryption , 2007, USENIX Annual Technical Conference.

[28]  Ethan L. Miller,et al.  Long-term threats to secure archives , 2006, StorageSS '06.

[29]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[30]  Adi Shamir,et al.  The Steganographic File System , 1998, Information Hiding.

[31]  Darrell D. E. Long,et al.  Deep Store: an archival storage system architecture , 2005, 21st International Conference on Data Engineering (ICDE'05).

[32]  Andrew W. Leung,et al.  Scalable security for petascale parallel file systems , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[33]  Aviel D. Rubin,et al.  Publius: a robust, tamper-evident, censorship-resistant web publishing system , 2000 .

[34]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[35]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[36]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[37]  Andrea C. Arpaci-Dusseau,et al.  Deconstructing commodity storage clusters , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).