Analysis of Indexing Structures for Immutable Data

In emerging applications such as blockchains and collaborative data analytics, there are strong demands for data immutability, multi-version accesses, and tamper-evident controls. To provide efficient support for lookup and merge operations, three new index structures for immutable data, namely Merkle Patricia Trie (MPT), Merkle Bucket Tree(MBT), and Pattern-Oriented-Split Tree (POS-Tree), have been proposed. Although these structures have been adopted in real applications, there is no systematic evaluation of their pros and cons in the literature, making it difficult for practitioners to choose the right index structure for their applications. To alleviate the above problem, we present a comprehensive analysis of the existing index structures for immutable data, and evaluate both their asymptotic and empirical performance. Specifically, we show that MPT, MBT, and POS-Tree are all instances of a recently proposed framework, dubbed Structurally Invariant and Reusable Indexes (SIRI). We propose to evaluate the SIRI instances on their index performance and deduplication capability. We establish the worst-case guarantees of each index, and experimentally evaluate all indexes in a wide variety of settings. Based on our theoretical and empirical analysis, we conclude that POS-Tree is a favorable choice for indexing immutable data.

[1]  Ken Eguro,et al.  Concerto: A High Concurrency Key-Value Store with Integrity , 2017, SIGMOD Conference.

[2]  Aditya G. Parameswaran,et al.  DataHub: Collaborative Data Science & Dataset Version Management at Scale , 2014, CIDR.

[3]  Aditya G. Parameswaran,et al.  Decibel: The Relational Dataset Branching System , 2016, Proc. VLDB Endow..

[4]  Gang Chen,et al.  Untangling Blockchain: A Data Processing View of Blockchain Systems , 2017, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[6]  A BernsteinPhilip,et al.  Multiversion concurrency controltheory and algorithms , 1983 .

[7]  Robert E. Tarjan,et al.  Making data structures persistent , 1986, STOC '86.

[8]  Craig A. N. Soules,et al.  Self-securing storage: protecting data in compromised systems , 2000, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[9]  Richard T. Snodgrass,et al.  Performance evaluation of a temporal database management system , 1986, SIGMOD '86.

[10]  Norman C. Hutchinson,et al.  Elephant: the file system that never forgets , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[11]  João Paulo,et al.  A Survey and Classification of Storage Deduplication Systems , 2014, ACM Comput. Surv..

[12]  Beng Chin Ooi,et al.  ForkBase: Immutable, Tamper-evident Storage Substrate for Branchable Applications , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[13]  OHAD RODEH,et al.  B-trees, shadowing, and clones , 2008, TOS.

[14]  Dan R. K. Ports,et al.  Serializable Snapshot Isolation in PostgreSQL , 2012, Proc. VLDB Endow..

[15]  Eric Mays,et al.  Fully persistent B+-trees , 1991, SIGMOD '91.

[16]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[17]  Beng Chin Ooi,et al.  BLOCKBENCH: A Framework for Analyzing Private Blockchains , 2017, SIGMOD Conference.

[18]  Gang Chen,et al.  A Comprehensive Performance Evaluation of Modern In-Memory Indices , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[19]  George Kollios,et al.  Hashing Methods for Temporal Data , 2002, IEEE Trans. Knowl. Data Eng..

[20]  Chris Okasaki,et al.  Purely functional data structures , 1998 .

[21]  Kave Eshghi,et al.  A Framework for Analyzing and Improving Content-Based Chunking Algorithms , 2005 .

[22]  T. Chiueh,et al.  Design, Implementation, and Evaluation of a Repairable Database Management System , 2005, ICDE.

[23]  Aditya G. Parameswaran,et al.  OrpheusDB: Bolt-on Versioning for Relational Databases , 2017, Proc. VLDB Endow..

[24]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[25]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[26]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[27]  Beng Chin Ooi,et al.  ForkBase: An Efficient Storage Engine for Blockchain and Forkable Applications , 2018, Proc. VLDB Endow..

[28]  Daniel Davis Wood,et al.  ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .

[29]  Hong Jiang,et al.  A Comprehensive Study of the Past, Present, and Future of Data Deduplication , 2016, Proceedings of the IEEE.

[30]  Philip A. Bernstein,et al.  Categories and Subject Descriptors: H.2.4 [Database Management]: Systems. , 2022 .