Cache-oblivious string B-trees

B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally when keys are long or of variable length,when keys are compressed, even when using front compression, the standard B-tree compression scheme,for range queries, andwith respect to memory effects such as disk prefetching.This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: The COSB-tree searches asymptotically optimally and inserts and deletes nearly optimally.It maintains an index whose size is proportional to the front-compressed size of the dictionary. Furthermore, unlike standard front-compressed strings, keys can be decompressed in a memory-efficient manner.It performs range queries with no extra disk seeks; in contrast, B-trees incur disk seeks when skipping from leaf block to leaf block.It utilizes all levels of a memory hierarchy efficiently and makes good use of disk locality by using cache-oblivious layout strategies.

[1]  Prof. Dr. Kurt Mehlhorn,et al.  Data Structures and Algorithms 1 , 1984, EATCS.

[2]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3]  Gerth Stølting Brodal,et al.  Cache oblivious search trees via binary trees of small height , 2001, SODA '02.

[4]  Alon Itai,et al.  A Sparse Table Implementation of Priority Queues , 1981, ICALP.

[5]  Robert E. Wagner,et al.  Indexing Design Considerations , 1973, IBM Syst. J..

[6]  Richard Cole,et al.  Two Simplified Algorithms for Maintaining Order in a List , 2002, ESA.

[7]  Margo I. Seltzer,et al.  File system aging—increasing the relevance of file system benchmarks , 1997, SIGMETRICS '97.

[8]  Moni Naor String Matching with Preprocessing of Text and Pattern , 1991, ICALP.

[9]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[10]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[11]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[12]  Rudolf Bayer,et al.  Prefix B-trees , 1977, TODS.

[13]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[14]  Michael A. Bender,et al.  Efficient Tree Layout in a Multilevel Memory Hierarchy , 2002, ESA.

[15]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[16]  Alon Itai,et al.  How to Pack Trees , 1999, J. Algorithms.

[17]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[18]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry , 2012, EATCS Monographs on Theoretical Computer Science.

[19]  Roberto Grossi,et al.  The string B-tree: a new data structure for string search in external memory and its applications , 1999, JACM.

[20]  Yossi Matias,et al.  Efficient Randomized Dictionary Matching Algorithms (Extended Abstract) , 1992, CPM.

[21]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 1: Sorting and Searching , 2011, EATCS Monographs on Theoretical Computer Science.

[22]  Erik D. Demaine,et al.  Worst-Case Optimal Tree Layout in a Memory Hierarchy , 2004, ArXiv.

[23]  Richard Cole,et al.  Scanning and Traversing: Maintaining Data for Traversals in a Memory Hierarchy , 2002, ESA.

[24]  Michael L. Fredman,et al.  Surpassing the Information Theoretic Bound with Fusion Trees , 1993, J. Comput. Syst. Sci..

[25]  Gerth Stølting Brodal,et al.  Cache-oblivious string dictionaries , 2006, SODA '06.

[26]  Jing Wu,et al.  A locality-preserving cache-oblivious dynamic dictionary , 2002, SODA '02.

[27]  Paul F. Dietz,et al.  Two algorithms for maintaining order in a list , 1987, STOC.