On Elias-Fano for Rank Queries in FM-Indexes**Funded in part by Academy of Finland grant 319454.

We describe methods to support fast rank queries on the Burrows-Wheeler transform (BWT) string $S$ of an input string $T$ on alphabet $\Sigma$, in order to support pattern counting queries. Our starting point is an approach previously adopted by several authors, which is to represent $S$ as $\vert \Sigma\vert$ bitvectors, where the bitvector for symbol $c$ has a 1 at position $c$ if and only if $S[i]=c$, with the bitvec-tors stored in Elias-Fano (EF) encodings, to enable binary rank queries. We first show that the clustering of symbols induced by the BWT makes standard implementations of EF unattractive. We then engineer several improvements to EF that go some way to alleviating this problem, and go on to describe two new EF-inspired bitvectors that have superior practical performance.

[1]  Gonzalo Navarro,et al.  Storage and Retrieval of Highly Repetitive Sequence Collections , 2010, J. Comput. Biol..

[2]  Peter van Emde Boas,et al.  Design and implementation of an efficient priority queue , 1976, Mathematical systems theory.

[3]  Diego Arroyuelo,et al.  A Practical Alphabet-Partitioning Rank/Select Data Structure , 2019, SPIRE.

[4]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[5]  Gonzalo Navarro,et al.  Alphabet-Independent Compressed Text Indexing , 2011, TALG.

[6]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[7]  Kunihiko Sadakane,et al.  Succinct representations of lcp information and improvements in the compressed suffix arrays , 2002, SODA '02.

[8]  Szymon Grabowski,et al.  FM-index for Dummies , 2015, BDAS.

[9]  Jouni Sirén,et al.  Indexing Variation Graphs , 2016, ALENEX.

[10]  Gonzalo Navarro,et al.  Implicit Compression Boosting with Applications to Self-indexing , 2007, SPIRE.

[11]  Meng He,et al.  Indexing Compressed Text , 2003 .

[12]  Jeffrey Scott Vitter,et al.  A Practical Implementation of Compressed Suffix Arrays with Applications to Self-Indexing , 2014, 2014 Data Compression Conference.

[13]  Naila Rahman,et al.  Rank and Select Operations on Binary Strings , 2016, Encyclopedia of Algorithms.

[14]  Kunihiko Sadakane,et al.  Practical Entropy-Compressed Rank/Select Dictionary , 2006, ALENEX.

[15]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[16]  Alistair Moffat,et al.  CSA++: Fast Pattern Search for Large Alphabets , 2016, ALENEX.

[17]  Joong Chae Na,et al.  Fast Computation of Rank and Select Functions for Succinct Representation , 2009, IEICE Trans. Inf. Syst..

[18]  Gonzalo Navarro,et al.  Compact Data Structures - A Practical Approach , 2016 .