Improved Address-Calculation Coding of Integer Arrays

In this paper we deal with compressed integer arrays that are equipped with fast random access. Our treatment improves over an earlier approach that used address-calculation coding to locate the elements and supported access and search operations in $O(\lg (n+s))$ time for a sequence of n non-negative integers summing up to s. The idea is to complement the address-calculation method with index structures that considerably decrease access times and also enable updates. For all our structures the memory usage is $n \lg(1 + s/n) + O(n)$ bits. First a read-only version is introduced that supports rank-based accesses to elements and retrievals of prefix sums in $O(\lg \lg (n+s)$) time, as well as prefix-sum searches in $O(\lg n+ \lg \lg s)$ time, using the word RAM as the model of computation. The second version of the data structure supports accesses in $O(\lg\lg U)$ time and changes of element values in $O(\lg^2 U)$ time, where U is the universe size. Both versions performed quite well in practical experiments. A third extension to dynamic arrays is also described, supporting accesses and prefix-sum searches in $O(\lg n + \lg\lg U)$ time, and insertions and deletions in $O(\lg^2 U)$ time.

[1]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[2]  Niklaus Wirth,et al.  Algorithms and Data Structures , 1989, Lecture Notes in Computer Science.

[3]  Peter M. Fenwick,et al.  A new data structure for cumulative frequency tables , 1994, Softw. Pract. Exp..

[4]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[5]  Paolo Ferragina,et al.  A simple storage scheme for strings achieving entropy bounds , 2007, SODA '07.

[6]  J. Shane Culpepper,et al.  Compact Set Representation for Information Retrieval , 2007, SPIRE.

[7]  Rodrigo González,et al.  Statistical Encoding of Succinct Data Structures , 2006, CPM.

[8]  Alistair Moffat,et al.  Binary Interpolative Coding for Effective Index Compression , 2000, Information Retrieval.

[9]  Naila Rahman,et al.  Compressed Prefix Sums , 2007, SOFSEM.

[10]  Технология Springer Science+Business Media , 2013 .

[11]  Alistair Moffat Compressing Integer Sequences and Sets , 2008, Encyclopedia of Algorithms.

[12]  D. Eppstein Foreword to special issue on SODA 2002 , 2007, TALG.

[13]  Kunihiko Sadakane,et al.  CRAM: Compressed Random Access Memory , 2010, ICALP.

[14]  Torben Hagerup,et al.  Sorting and Searching on the Word RAM , 1998, STACS.

[15]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[16]  Roberto Grossi,et al.  Squeezing succinct data structures into entropy bounds , 2006, SODA '06.

[17]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[18]  Jukka Teuhola Interpolative coding of integer sequences supporting log-time random access , 2011, Inf. Process. Manag..

[19]  Gonzalo Navarro,et al.  Directly Addressable Variable-Length Codes , 2009, SPIRE.

[20]  Wing-Kai Hon,et al.  Compressed data structures: dictionaries and data-aware measures , 2006, Data Compression Conference (DCC'06).

[21]  Peter Sanders,et al.  Engineering basic algorithms of an in-memory text search engine , 2010, TOIS.

[22]  S. Srinivasa Rao,et al.  A compact data structure for representing a dynamic multiset , 2010, Inf. Process. Lett..

[23]  Rajeev Raman,et al.  Succinct Dynamic Data Structures , 2001, WADS.

[24]  Wiebe van der Hoek,et al.  SOFSEM 2007: Theory and Practice of Computer Science , 2007 .