Faster, Minuter

The FM index (Ferragina & Manzini, J. ACM, 2005) is a widely-used compresseddata structure that stores a string T in a compressed form that also supports fast pattern matching queries. Fixed-block boosting is a relatively straightforward technique that achieves optimal index size in theory, but to date it is unclear how best to translate the method into practice. In this paper we describe several new techniques for implementing fixed-block boosting efficiently. The new indexes are consistently fast and small relative to the state-of-the-art, and thus make a good "off-the-shelf" choice for most applications.

[1]  Gonzalo Navarro,et al.  Implicit Compression Boosting with Applications to Self-indexing , 2007, SPIRE.

[2]  Raffaele Giancarlo,et al.  Boosting textual compression in optimal linear time , 2005, JACM.

[3]  Gonzalo Navarro,et al.  Storage and Retrieval of Highly Repetitive Sequence Collections , 2010, J. Comput. Biol..

[4]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[5]  Meng He,et al.  Indexing Compressed Text , 2003 .

[6]  Paul Flicek,et al.  Sense from sequence reads: methods for alignment and assembly , 2009, Nature Methods.

[7]  Kunihiko Sadakane,et al.  Succinct de Bruijn Graphs , 2012, WABI.

[8]  Gonzalo Navarro,et al.  Succinct Suffix Arrays based on Run-Length Encoding , 2005, Nord. J. Comput..

[9]  Juha Kärkkäinen,et al.  Hybrid Compression of Bitvectors for the FM-Index , 2014, 2014 Data Compression Conference.

[10]  Simon Gog,et al.  Optimized succinct data structures for massive data , 2014, Softw. Pract. Exp..

[11]  Gonzalo Navarro,et al.  Fast, Small, Simple Rank/Select on Bitmaps , 2012, SEA.

[12]  Rodrigo González,et al.  Compressed text indexes: From theory to practice , 2007, JEAL.

[13]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[14]  Gonzalo Navarro,et al.  The wavelet matrix: An efficient wavelet tree for large alphabets , 2015, Inf. Syst..

[15]  Gonzalo Navarro,et al.  The Wavelet Matrix , 2012, SPIRE.

[16]  Juha Kärkkäinen,et al.  Fixed Block Compression Boosting in FM-Indexes , 2011, SPIRE.

[17]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..