Compressed Prefix Sums

We consider the prefix sumsproblem: given a (static) sequence of positive integers $\vec{x} = (x_1, \ldots, x_n)$, such that $\sum_{i=1}^n x_i = m$, we wish to support the operation ${\sf sum}(\vec{x},j)$, which returns $\sum_{i=1}^{j} x_i$. Our interest is in minimising the space required for storing $\vec{x}$, where `minimal space' is defined according to some compressibility criteria, while supporting sum as rapidly as possible. There are two main compressibility criteria: (a) the succinctspace bound, $B(m, n) = \lceil \log_2 {{m-1}\choose{n-1}} \rceil$ bits, applies to any sequence $\vec{x}$ whose elements add up to m; (b) data-awaremeasures, which depend on the values in $\vec{x}$, and can be lower than the succinct bound for some sequences. Appropriate data-aware measures have been studied extensively in the information retrieval (IR) community [17]. We demonstrate a close connection between the data-aware measure that is the best in practice for an important IR application and the succinct bound. We give theoretical solutions that use space close to other data-aware compressibility measures (often within o(n) bits), and support sum in doubly-logarithmic (or better) time, and experimental evaluations of practical variants thereof. A bit-vectoris a data structure that supports ` rank / select ' on a bit-string, and is fundamental to succinct and compressed data structures. We describe a new bit-vector that is robust and efficient.

[1]  David R. Clark,et al.  Efficient suffix trees on secondary storage , 1996, SODA '96.

[2]  Michael L. Fredman,et al.  Trans-Dichotomous Algorithms for Minimum Spanning Trees and Shortest Paths , 1994, J. Comput. Syst. Sci..

[3]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees and multisets , 2002, SODA '02.

[4]  Ian H. Witten,et al.  Managing gigabytes 2nd edition , 1999 .

[5]  Naila Rahman,et al.  Engineering the LOUDS Succinct Tree Representation , 2006, WEA.

[6]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.

[7]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[8]  Roberto Grossi,et al.  Squeezing succinct data structures into entropy bounds , 2006, SODA '06.

[9]  Joong Chae Na,et al.  Efficient Implementation of Rank and Select Functions for Succinct Representation , 2005, WEA.

[10]  Torben Hagerup,et al.  Sorting and Searching on the Word RAM , 1998, STACS.

[11]  Wing-Kai Hon,et al.  Compressed Dictionaries: Space Measures, Data Sets, and Experiments , 2006, WEA.

[12]  Klaus Jansen,et al.  Experimental and Efficient Algorithms , 2003, Lecture Notes in Computer Science.

[13]  Roberto Grossi,et al.  Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) , 2000, STOC '00.

[14]  Wing-Kai Hon,et al.  Compressed data structures: dictionaries and data-aware measures , 2006, Data Compression Conference (DCC'06).

[15]  Naila Rahman,et al.  A simple optimal representation for balanced parentheses , 2006, Theor. Comput. Sci..

[16]  Ian H. Witten,et al.  Managing gigabytes , 1994 .

[17]  Torben Hagerup,et al.  Efficient Minimal Perfect Hashing in Nearly Minimal Space , 2001, STACS.