Control branch ‐ based Data Security integrity Consistency merge semantics Document Hosting Git Collaborative AnalyticsBlockchain

Existing data storage systems offer a wide range of functionalities to accommodate an equally diverse range of applications. However, new classes of applications have emerged, e.g., blockchain and collaborative analytics, featuring data versioning, fork semantics, tamper-evidence or any combination thereof. They present new opportunities for storage systems to efficiently support such applications by embedding the above requirements into the storage. In this paper, we present ForkBase, a storage engine specifically designed to provide efficient support for blockchain and forkable applications. By integrating the core application properties into the storage, ForkBase not only delivers high performance but also reduces development effort. Data in ForkBase is multi-versioned, and each version uniquely identifies the data content and its history. Two variants of fork semantics are supported in ForkBase to facilitate any collaboration workflows. A novel index structure is introduced to efficiently identify and eliminate duplicate content across data objects. Consequently, ForkBase is not only efficient in performance, but also in space requirement. We demonstrate the performance of ForkBase using three applications: a blockchain platform, a wiki engine and a collaborative analytics application. We conduct extensive experimental evaluation of these applications against respective state-of-the-art system. The results show that ForkBase achieves superior performance while significantly lowering the development cost.

[1]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[2]  Richard T. Snodgrass,et al.  Performance evaluation of a temporal database management system , 1986, SIGMOD '86.

[3]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[4]  Michael Stonebraker,et al.  The Design of the POSTGRES Storage System , 1988, VLDB.

[5]  Moni Naor,et al.  Pricing via Processing or Combatting Junk Mail , 1992, CRYPTO.

[6]  Sushil Jajodia,et al.  Temporal Databases: Theory, Design, and Implementation , 1993 .

[7]  Jonathan D. Cohen,et al.  Recursive hashing functions for n-grams , 1997, TOIS.

[8]  Norman C. Hutchinson,et al.  Elephant: the file system that never forgets , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[9]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[10]  David B. Lomet,et al.  The BT-tree: A Branched and Temporal Access Method , 2000, VLDB.

[11]  MaziéresDavid,et al.  A low-bandwidth network file system , 2001 .

[12]  Qian Wang,et al.  Plutus: Scalable Secure File Sharing on Untrusted Storage , 2003, FAST.

[13]  Craig A. N. Soules,et al.  Metadata Efficiency in Versioning File Systems , 2003, FAST.

[14]  Craig A. N. Soules,et al.  Self-securing storage: protecting data in compromised systems , 2000, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[15]  Dennis Shasha,et al.  Secure Untrusted Data Repository (SUNDR) , 2004, OSDI.

[16]  Kave Eshghi,et al.  A Framework for Analyzing and Improving Content-Based Chunking Algorithms , 2005 .

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[19]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[20]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[21]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[22]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[23]  Aiko Pras,et al.  Inside dropbox: understanding personal cloud storage services , 2012, Internet Measurement Conference.

[24]  Michael Stonebraker,et al.  Efficient Versioning for Scientific Array Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[25]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[26]  Udayan Khurana,et al.  Efficient snapshot retrieval over historical graph data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[27]  João Paulo,et al.  A Survey and Classification of Storage Deduplication Systems , 2014, ACM Comput. Surv..

[28]  Aditya G. Parameswaran,et al.  DataHub: Collaborative Data Science & Dataset Version Management at Scale , 2014, CIDR.

[29]  Aviv Zohar,et al.  Secure High-Rate Transaction Processing in Bitcoin , 2015, Financial Cryptography.

[30]  Zhao Zhang,et al.  Rethinking Data-Intensive Science Using Scalable Analytics Systems , 2015, SIGMOD Conference.

[31]  Aditya G. Parameswaran,et al.  Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff , 2015, Proc. VLDB Endow..

[32]  Aditya G. Parameswaran,et al.  Decibel: The Relational Dataset Branching System , 2016, Proc. VLDB Endow..

[33]  Aditya G. Parameswaran,et al.  OrpheusDB: A Lightweight Approach to Relational Dataset Versioning , 2017, SIGMOD Conference.

[34]  Amol Deshpande,et al.  DEX: Query Execution in a Delta-based Storage System , 2017, SIGMOD Conference.

[35]  Gang Chen,et al.  UStore: A Distributed Storage With Rich Semantics , 2017, ArXiv.

[36]  Beng Chin Ooi,et al.  BLOCKBENCH: A Framework for Analyzing Private Blockchains , 2017, SIGMOD Conference.

[37]  Gang Chen,et al.  Untangling Blockchain: A Data Processing View of Blockchain Systems , 2017, IEEE Transactions on Knowledge and Data Engineering.

[38]  Natacha Crooks TARDiS: A Branch-and-Merge Approach to Weak Consistency , 2019, Encyclopedia of Big Data Technologies.

[39]  Arvind Narayanan,et al.  BlockSci: Design and applications of a blockchain analysis platform , 2017, USENIX Security Symposium.