Secure Deduplication of General Computations

The world's fast-growing data has become highly concentrated on enterprise or cloud storage servers. Data deduplication reduces redundancy in this data, saving storage and simplifying management. While existing systems can deduplicate computations on this data by memoizing and reusing computation results, they are insecure, not general, or slow. This paper presents UNIC, a system that securely deduplicates general computations. It exports a cache service that allows applications running on behalf of mutually distrusting users on local or remote hosts to memoize and reuse computation results. Key in UNIC are three new ideas. First, through a novel use of code attestation, UNIC achieves both integrity and secrecy. Second, it provides a simple yet expressive API that enables applications to deduplicate their own rich computations. This design is much more general and flexible than existing systems that can deduplicate only specific types of computations. Third, UNIC explores a cross-layer design that allows the underlying storage system to expose data deduplication information to the applications for better performance. Evaluation of UNIC on four popular open-source applications shows that UNIC is easy to use, fast, and with little storage overhead.

[1]  Manuel Blum,et al.  Software reliability via run-time result-checking , 1997, JACM.

[2]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[3]  Elaine Shi,et al.  Pioneer: verifying code integrity and enforcing untampered code execution on legacy systems , 2005, SOSP '05.

[4]  DONALD MICHIE,et al.  “Memo” Functions and Machine Learning , 1968, Nature.

[5]  William A. Arbaugh,et al.  A secure and reliable bootstrap architecture , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[6]  Derek Bruening,et al.  AddressSanitizer: A Fast Address Sanity Checker , 2012, USENIX Annual Technical Conference.

[7]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[8]  Miguel Castro,et al.  Baggy Bounds Checking: An Efficient and Backwards-Compatible Defense against Out-of-Bounds Errors , 2009, USENIX Security Symposium.

[9]  William J. Bolosky,et al.  Single instance storage in Windows® 2000 , 2000 .

[10]  Cezary Dubnicki,et al.  HydraFS: A High-Throughput File System for the HYDRAstor Content-Addressable Storage System , 2010, FAST.

[11]  Lenin Ravindranath,et al.  Nectar: Automatic Management of Data and Computation in Datacenters , 2010, OSDI.

[12]  Michal Kaczmarczyk,et al.  HYDRAstor: A Scalable Secondary Storage , 2009, FAST.

[13]  Allan Heydon,et al.  Caching function calls using precise dependencies , 2000, PLDI '00.

[14]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[15]  William Pugh,et al.  Incremental computation via function caching , 1989, POPL '89.

[16]  Yanhong A. Liu,et al.  Static caching for incremental computation , 1998, TOPL.

[17]  Elaine Shi,et al.  BIND: a fine-grained attestation service for secure distributed systems , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[18]  References , 1971 .

[19]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[20]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[21]  Galen C. Hunt,et al.  Shielding Applications from an Untrusted Cloud with Haven , 2014, OSDI.

[22]  Angelos D. Keromytis,et al.  Automated Recovery in a Secure Bootstrap Process , 1998, NDSS.

[23]  William J. Bolosky,et al.  Single Instance Storage in Windows , 2000 .