Incrementally maintaining run-length encoded attributes in column stores

Run-length encoding is a popular compression scheme which is used extensively to compress the attribute values in column stores. Out of order insertion of tuples potentially degrades the compression achieved using run-length encoding and consequently, the performance of reads. The in-place insertions, deletions and updates of tuples into a column store relation with n tuples take O(n) time. The linear cost is typically avoided by amortizing the cost of updates in batches. However, the relation is decompressed and subsequently re-compressed after applying a batch of updates. This leads to added time time complexity. We propose a novel indexing scheme called count indexes that supports O(log n) in-place insertions, deletions, updates and look ups on a run-length encoded sequence with n runs. We also show that count indexes efficiently update a batch of tuples requiring almost a constant time per updated tuple. Additionally, we show that count indexes are optimal. We extend count indexes to support O(log n) updates on bitmapped sequences with n values and adapt them to block-based stores.

[1]  Michael L. Fredman,et al.  The Complexity of Maintaining an Array and Computing Its Partial Sums , 1982, JACM.

[2]  Daniel J. Abadi,et al.  Column oriented Database Systems , 2009, Proc. VLDB Endow..

[3]  Alexander Zeier,et al.  Enterprise Application-Specific Data Management , 2010, 2010 14th IEEE International Enterprise Distributed Object Computing Conference.

[4]  Pankaj K. Agarwal,et al.  CRB-Tree: An Efficient Indexing Scheme for Range-Aggregate Queries , 2003, ICDT.

[5]  Michael L. Fredman,et al.  A Lower Bound on the Complexity of Orthogonal Range Queries , 1981, JACM.

[6]  Leonidas J. Guibas,et al.  A dichromatic framework for balanced trees , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[7]  Jennifer Widom,et al.  Incremental computation and maintenance of temporal aggregates , 2003, The VLDB Journal.

[8]  Marcin Zukowski,et al.  Positional update handling in column stores , 2010, SIGMOD Conference.

[9]  Owen Kaser,et al.  Sorting improves word-aligned bitmap indexes , 2010, Data Knowl. Eng..

[10]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[11]  Bernard Chazelle,et al.  Computing partial sums in multidimensional arrays , 1989, SCG '89.

[12]  Ying Wang,et al.  An improved AG-Tree based on column store , 2010, 2010 International Conference on Artificial Intelligence and Education (ICAIE).

[13]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[14]  Martin L. Kersten,et al.  Updating a cracked database , 2007, SIGMOD '07.

[15]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[16]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[17]  M. AdelsonVelskii,et al.  AN ALGORITHM FOR THE ORGANIZATION OF INFORMATION , 1963 .

[18]  Peter Boncz,et al.  UvA-DARE ( Digital Academic Repository ) Monet ; a next-Generation DBMS Kernel For Query-Intensive Applications , 2007 .

[19]  Erik D. Demaine,et al.  Tight bounds for the partial-sums problem , 2004, SODA '04.

[20]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[21]  Richard T. Snodgrass,et al.  Spatiotemporal aggregate computation: a survey , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Peter M. Fenwick,et al.  A new data structure for cumulative frequency tables , 1994, Softw. Pract. Exp..