A Practical Alphabet-Partitioning Rank/Select Data Structure

This paper proposes a practical implementation of an alphabet-partitioning compressed data structure, which represents a string within compressed space and supports the fundamental operations \(\mathsf {rank}\) and \(\mathsf {select}\) efficiently. We show experimental results that indicate that our implementation outperforms the current realizations of the alphabet-partitioning approach (which is one of the most efficient approaches in practice). In particular, the time for operation \(\mathsf {select}\) can be reduced by about 80%, using only 11% more space than current alphabet-partitioning schemes. We also show the impact of our data structure on several applications, like the intersection of inverted lists (where improvements of up to 60% are achieved, using only 2% of extra space), and the distributed-computation processing of \(\mathsf {rank}\) and \(\mathsf {select}\) operations. As far as we know, this is the first study about the support of \(\mathsf {rank}\)/\(\mathsf {select}\) operations on a distributed-computing environment.

[1]  S. Srinivasa Rao,et al.  Rank/select operations on large alphabets: a tool for text indexing , 2006, SODA '06.

[2]  Hugh E. Williams,et al.  Fast generation of result snippets in web search , 2007, SIGIR.

[3]  Diego Arroyuelo,et al.  Compressed Self-indices Supporting Conjunctive Queries on Document Collections , 2010, SPIRE.

[4]  Stelios Joannou,et al.  An Empirical Evaluation of Extendible Arrays , 2011, SEA.

[5]  Mauricio Marín,et al.  Distributed search based on self-indexed compressed text , 2012, Inf. Process. Manag..

[6]  Diego Arroyuelo,et al.  Hybrid compression of inverted lists for reordered document collections , 2018, Inf. Process. Manag..

[7]  Gonzalo Navarro,et al.  Compact Data Structures - A Practical Approach , 2016 .

[8]  Erik D. Demaine,et al.  Resizable Arrays in Optimal Time and Space , 1999, WADS.

[9]  Torsten Suel,et al.  To index or not to index: time-space trade-offs in search engines with positional ranking functions , 2012, SIGIR '12.

[10]  Giuseppe Ottaviano,et al.  Partitioned Elias-Fano indexes , 2014, SIGIR.

[11]  Kunihiko Sadakane,et al.  Practical Entropy-Compressed Rank/Select Dictionary , 2006, ALENEX.

[12]  Rajeev Raman,et al.  On the Redundancy of Succinct Data Structures , 2008, SWAT.

[13]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[14]  Simon Gog,et al.  Optimized succinct data structures for massive data , 2014, Softw. Pract. Exp..

[15]  Alistair Moffat,et al.  CSA++: Fast Pattern Search for Large Alphabets , 2016, ALENEX.

[16]  Amir Said Efficient alphabet partitioning algorithms for low-complexity entropy coding , 2005, Data Compression Conference.

[17]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[18]  Gonzalo Navarro Wavelet trees for all , 2014, J. Discrete Algorithms.

[19]  Rajeev Raman,et al.  Optimal Trade-Offs for Succinct String Indexes , 2010, ICALP.

[20]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[21]  Gonzalo Navarro,et al.  Efficient Fully-Compressed Sequence Representations , 2012, Algorithmica.