Enumeration of LCP values, LCP intervals and Maximal repeats in BWT-runs Bounded Space

Lcp-values, lcp-intervals, and maximal repeats are powerful tools in various string processing tasks and have a wide variety of applications. Although many researchers have focused on developing enumeration algorithms for them, those algorithms are inefficient in that the space usage is proportional to the length of the input string. Recently, the run-length-encoded Burrows-Wheeler transform (RLBWT) has attracted increased attention in string processing, and various algorithms on the RLBWT have been developed. Developing enumeration algorithms for lcp-intervals, lcp-values, and maximal repeats on the RLBWT, however, remains a challenge. In this paper, we present the first such enumeration algorithms with space usage not proportional to the string length. The complexities of our enumeration algorithms are $O(n \log \log (n/r))$ time and $O(r)$ words of working space for string length $n$ and RLBWT size $r$.

[1]  Juha Kärkkäinen,et al.  Versatile Succinct Representations of the Bidirectional Burrows-Wheeler Transform , 2013, ESA.

[2]  Verónica Becher,et al.  Efficient computation of all perfect repeats in genomic sequences of up to half a gigabyte, with a case study on the human genome , 2009, Bioinform..

[3]  Tomasz Kociumaka,et al.  Resolution of the Burrows-Wheeler Transform Conjecture , 2019, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[4]  Gonzalo Navarro,et al.  Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space , 2018, J. ACM.

[5]  Giovanna Rosone,et al.  Space-Efficient Construction of Compressed Suffix Trees , 2019, Theor. Comput. Sci..

[6]  Alberto Policriti,et al.  LZ77 Computation Based on the Run-Length Encoded BWT , 2018, Algorithmica.

[7]  Enno Ohlebusch,et al.  Computing the longest common prefix array based on the Burrows-Wheeler transform , 2011, J. Discrete Algorithms.

[8]  Mathieu Raffinot,et al.  Composite Repetition-Aware Data Structures , 2015, CPM.

[9]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[10]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[11]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[12]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[13]  Gonzalo Navarro,et al.  New Lower and Upper Bounds for Representing Sequences , 2011, ESA.

[14]  Juha Kärkkäinen,et al.  Permuted Longest-Common-Prefix Array , 2009, CPM.

[15]  Volker Heun,et al.  Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays , 2011, SIAM J. Comput..

[16]  Fabio Cunial,et al.  Space-Efficient Detection of Unusual Words , 2015, SPIRE.

[17]  Enno Ohlebusch,et al.  Space-Efficient Computation of Maximal and Supermaximal Repeats in Genome Sequences , 2012, SPIRE.

[18]  Gonzalo Navarro,et al.  Improved compressed indexes for full-text document retrieval , 2011, J. Discrete Algorithms.

[19]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[20]  Hideo Bannai,et al.  MR-RePair: Grammar Compression Based on Maximal Repeats , 2019, 2019 Data Compression Conference (DCC).

[21]  Dominik Kempa Optimal Construction of Compressed Indexes for Highly Repetitive Texts , 2019, SODA.

[22]  Hiroshi Sakamoto,et al.  A faster implementation of online RLBWT and its application to LZ77 parsing , 2018, J. Discrete Algorithms.

[23]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[24]  Atsuhiro Takasu,et al.  Clustering Documents with Maximal Substrings , 2011, ICEIS.

[25]  Kunihiko Sadakane,et al.  Succinct representations of lcp information and improvements in the compressed suffix arrays , 2002, SODA '02.

[26]  Jeffrey Scott Vitter,et al.  Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Rajeev Raman,et al.  Optimal Trade-Offs for Succinct String Indexes , 2010, ICALP.

[28]  Maxime Crochemore,et al.  On Compact Directed Acyclic Word Graphs , 1997, Structures in Logic and Computer Science.

[29]  Hideo Bannai,et al.  Online LZ77 Parsing and Matching Statistics with RLBWTs , 2018, CPM.

[30]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[31]  Jun'ichi Tsujii,et al.  Text Categorization with All Substring Features , 2009, SDM.

[32]  Djamal Belazzougui,et al.  Linear time construction of compressed text indices in compact space , 2014, STOC.

[33]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.