Auditable versioned data storage outsourcing

Auditability is crucial for data outsourcing, facilitating accountability and identifying data loss or corruption incidents in a timely manner, reducing in turn the risks from such losses. In recent years, in synch with the growing trend of outsourcing, a lot of progress has been made in designing probabilistic (for efficiency) provable data possession (PDP) schemes. However, even the recent and advanced PDP solutions that do deal with dynamic data, do so in a limited manner, and for only the latest version of the data. A naive solution treating different versions in isolation would work, but leads to tremendous overheads, and is undesirable. In this paper, we present algorithms to achieve full persistence (all intermediate configurations are preserved and are modifiable) for an optimized skip list (known as FlexList) so that versioned data can be audited. The proposed scheme provides deduplication at the level of logical, variable sized blocks, such that only the altered parts of the different versions are kept, while the persistent data-structure facilitates access (read) of any arbitrary version with the same storage and process efficiency that state-of-the-art dynamic PDP solutions provide for only the current version, while commit (write) operations incur around 5% additional time. Furthermore, the time overhead for auditing arbitrary versions in addition to the latest version is imperceptible even on a low-end server. Additionally, the application of our approach opens up the possibility to naturally support block level deduplication. While a naive solution to audit versions would copy the whole data and the data structure for each version, our solution utilizes storage space amounting very close to the most efficient delta-based solutions. Accordingly, we explore how the proposed data structure benefits the system with block level deduplication besides adding auditability property, and how it can be integrated with a state-of-the-art versioning system (Git), and in the process scale the storage efficiency of Git, and thus help scale the size of data to be stored in Git, without compromising the retrieval efficiency of arbitrary versions. Algorithms realizing a persistent data structure to support versioning are proposed.The function of auditing the latest version is extended to audit all versions of data.Unlike delta-based versioning expected worst-case is significantly better.Unlike tree-based approaches, it is achieved without any re-balancing operation.Our approach realizes block level deduplication across versions of data.

[1]  Cong Wang,et al.  Enabling Public Verifiability and Data Dynamics for Storage Security in Cloud Computing , 2009, ESORICS.

[2]  Erez Zadok,et al.  A Versatile and User-Oriented Versioning File System , 2004, FAST.

[3]  Robert E. Tarjan,et al.  Making data structures persistent , 1986, STOC '86.

[4]  James E. Johnson,et al.  Overview of the Spiralog File System , 1996, Digit. Tech. J..

[5]  Sean Matthew Dorward,et al.  Awarded Best Paper! - Venti: A New Approach to Archival Data Storage , 2002 .

[6]  Reza Curtmola,et al.  Auditable Version Control Systems , 2014, NDSS.

[7]  Reza Curtmola,et al.  Provable data possession at untrusted stores , 2007, CCS '07.

[8]  William Pugh,et al.  A skip list cookbook , 1990 .

[9]  Kirby McCoy VMS File System Internals , 1990 .

[10]  Sailesh Chutani,et al.  The Episode File System , 1992 .

[11]  Yihua Zhang,et al.  Efficient Dynamic Provable Possession of Remote Data via Update Trees , 2016, TOS.

[12]  Moni Naor,et al.  Certificate revocation and certificate update , 1998, IEEE Journal on Selected Areas in Communications.

[13]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[14]  Ari Juels,et al.  Pors: proofs of retrievability for large files , 2007, CCS '07.

[15]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[16]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[17]  Yevgeniy Dodis,et al.  Proofs of Retrievability via Hardness Amplification , 2009, IACR Cryptol. ePrint Arch..

[18]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[19]  Roberto Di Pietro,et al.  Scalable and efficient provable data possession , 2008, IACR Cryptol. ePrint Arch..

[20]  Ertem Esiner,et al.  FlexDPDP , 2016, ACM Trans. Storage.

[21]  Shouhuai Xu,et al.  Fair and dynamic proofs of retrievability , 2011, CODASPY '11.

[22]  Yihua Zhang,et al.  Efficient dynamic provable possession of remote data via balanced update trees , 2013, ASIA CCS '13.

[23]  ZhangYihua,et al.  Efficient Dynamic Provable Possession of Remote Data via Update Trees , 2016 .

[24]  Craig A. N. Soules,et al.  Metadata Efficiency in Versioning File Systems , 2003, FAST.

[25]  Erez Zadok,et al.  Generating Realistic Datasets for Deduplication Analysis , 2012, USENIX Annual Technical Conference.

[26]  David Cash,et al.  Dynamic Proofs of Retrievability Via Oblivious RAM , 2013, Journal of Cryptology.

[27]  Randal Burns,et al.  Ext3cow: The Design, Implementation, and Analysis of Metadata for a Time-Shifting File System , 2003 .

[28]  Hovav Shacham,et al.  Compact Proofs of Retrievability , 2008, Journal of Cryptology.

[29]  Sean Quinlan,et al.  A cached WORM file system , 1991, Softw. Pract. Exp..