Compressed data structures: Dictionaries and data-aware measures

We propose measures for compressed data structures, in which space usage is measured in a data-aware manner. In particular, we consider the fundamental dictionary problem on set data, where the task is to construct a data structure to represent a set S of n items out of a universe U = {0,..., u $1} and support various queries on S. We use a well-known data-aware measure for set data called gap to bound the space of our data structures. We describe a novel dictionary structure taking gap+O(n log(u/n)/ log n)+O(n log log(u/n)) bits. Under the RAM model, our dictionary supports membership, rank, select, and predecessor queries in nearly optimal time, matching the time bound of Andersson and Thorup's predecessor structure (2000), while simultaneously improving upon their space usage. Our dictionary structure uses exactly gap bits in the leading term (i.e., the constant factor is 1) and answers queries in near-optimal time. When seen from the worst case perspective, we present the first O(n log(u/n))-bit dictionary structure which supports these queries in near-optimal time under RAM model. We also build a dictionary which requires the same space and supports membership, select, and partial rank queries even more quickly in O(loglogn) time. To the best of our knowledge, this is the first of a kind result which achieves data-aware space usage and retains near-optimal time.

[1]  Roberto Grossi,et al.  IP Address Lookup Made Fast and Simple , 1999, ESA.

[2]  Dan E. Willard New Trie Data Structures Which Support Very Fast Search Operations , 1984, J. Comput. Syst. Sci..

[3]  Faith Ellen,et al.  Optimal bounds for the predecessor problem , 1999, STOC '99.

[4]  Ian H. Witten,et al.  Data compression in full-text retrieval systems , 1993 .

[5]  Gonzalo Navarro,et al.  Rank and select revisited and extended , 2007, Theor. Comput. Sci..

[6]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[7]  Mikkel Thorup,et al.  Time-space trade-offs for predecessor search , 2006, STOC '06.

[8]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[9]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[10]  Guy Joseph Jacobson,et al.  Succinct static data structures , 1988 .

[11]  Shmuel Tomi Klein,et al.  Searching in compressed dictionaries , 2002, Proceedings DCC 2002. Data Compression Conference.

[12]  Roberto Grossi,et al.  Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) , 2000, STOC '00.

[13]  Wing-Kai Hon,et al.  Compressed Dictionaries: Space Measures, Data Sets, and Experiments , 2006, WEA.

[14]  Peter van Emde Boas,et al.  Design and implementation of an efficient priority queue , 1976, Mathematical systems theory.

[15]  Michael L. Fredman,et al.  Surpassing the Information Theoretic Bound with Fusion Trees , 1993, J. Comput. Syst. Sci..

[16]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[17]  Guy E. Blelloch,et al.  Compact representations of ordered sets , 2004, SODA '04.

[18]  Rasmus Pagh Low Redundancy in Static Dictionaries with O(1) Worst Case Lookup Time , 1999, ICALP.

[19]  Mikkel Thorup,et al.  Tight(er) worst-case bounds on dynamic searching and priority queues , 2000, STOC '00.

[20]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[21]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees and multisets , 2002, SODA '02.

[22]  Roberto Grossi,et al.  Squeezing succinct data structures into entropy bounds , 2006, SODA '06.

[23]  J. Ian Munro,et al.  Membership in Constant Time and Almost-Minimum Space , 1999, SIAM J. Comput..

[24]  Guy E. Blelloch,et al.  Dictionaries using variable-length keys and data, with applications , 2005, SODA '05.

[25]  D. Knuth,et al.  Mathematics for the Analysis of Algorithms , 1999 .