Compressed random access memory

AbstractMotivated by applicationswhichneed to storehugeamounts ofdatain the main memory of a computer, this paper proposes a new dynamicdata-structure for compressed random access memory. Ferragina andVenturini [SODA 2007, TCS 2007] recently gave a compressed data-structure for storing a string that allows substrings to be retrievedefficiently, but it requires the string to be static. Here, we extend theirresults in a non-trivial way to also allow the stored compressed stringto be modified during execution.Our results are as follows. A memory (or string) T[1..n], where eachcharacterT[i]isoflogσ bits, canbe storedinnH k (T)+O(nlogσ (k+1)(logσ+loglogn)logn )bits, where H k (T) is the k-th order empirical entropy of T, such that(1) accessing T[i..j] takes optimal O(1+(j−i)/log σ n) time and (2) re-placing T[i..i + log σ n − 1] by another string of length log σ n takesO(logn/loglogn) time. We can also support insertion and deletionof log σ n characters in O(logn/loglogn) time at the cost of increas-ing the access time to O(logn/loglogn) time, which matches a knownlower bound. In addition, our key observation that the empirical en-tropy of a string does not change much after a small change to thestring and our simple yet efficient method for maintaining an arrayof variable-length blocks under length modifications may be useful formany other applications as well.

[1]  Michael E. Saks,et al.  The cell probe complexity of dynamic data structures , 1989, STOC '89.

[2]  Giovanni Manzini,et al.  Compression of Low Entropy Strings with Lempel-Ziv Algorithms , 1999, SIAM J. Comput..

[3]  Giovanni Manzini,et al.  Indexing compressed text , 2005, JACM.

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[6]  Torben Hagerup,et al.  Sorting and Searching on the Word RAM , 1998, STACS.

[7]  Roberto Grossi,et al.  Squeezing succinct data structures into entropy bounds , 2006, SODA '06.

[8]  Rodrigo González,et al.  Rank/select on dynamic compressed sequences and applications , 2009, Theor. Comput. Sci..

[9]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[10]  Paolo Ferragina,et al.  A simple storage scheme for strings achieving entropy bounds , 2007, SODA '07.

[11]  Gonzalo Navarro,et al.  Dynamic entropy-compressed sequences and full-text indexes , 2006, TALG.

[12]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[13]  Rajeev Raman,et al.  An Efficient Quasidictionary , 2002, SWAT.

[14]  Rodrigo González,et al.  Statistical Encoding of Succinct Data Structures , 2006, CPM.

[15]  J. Ian Munro,et al.  Succinct Representations of Dynamic Strings , 2010, SPIRE.

[16]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[17]  Rajeev Raman,et al.  Succinct Dynamic Data Structures , 2001, WADS.