Compact Data Structures for Shortest Unique Substring Queries

Given a string T of length n, a substring u = T[i.. j] of T is called a shortest unique substring (SUS) for an interval [s, t] if (a) u occurs exactly once in T, (b) u contains the interval [s, t] (i.e. i \leq s \leq t \leq j), and (c) every substring v of T with |v| < |u| containing [s, t] occurs at least twice in T. Given a query interval [s, t] \subset [1, n], the interval SUS problem is to output all the SUSs for the interval [s, t]. In this article, we propose a 4n + o(n) bits data structure answering an interval SUS query in output-sensitive O(occ) time, where occ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for s = t. Here, we propose a \lceil (log2 3 + 1)n \rceil + o(n) bits data structure answering a point SUS query in the same output-sensitive time.

[1]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[2]  Hideo Bannai,et al.  Shortest Unique Substring Queries on Run-Length Encoded Strings , 2016, MFCS.

[3]  Jian Pei,et al.  On shortest unique substring queries , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[4]  Wing-Kai Hon,et al.  An In-place Framework for Exact and Approximate Shortest Unique Substring Queries , 2015, ISAAC.

[5]  Kunihiko Sadakane,et al.  Lempel–Ziv Factorization Powered by Space Efficient Suffix Trees , 2017, Algorithmica.

[6]  Dominik Kempa Optimal Construction of Compressed Indexes for Highly Repetitive Texts , 2019, SODA.

[7]  Rajeev Raman,et al.  Succinct Representations of Binary Trees for Range Minimum Queries , 2012, COCOON.

[8]  David Richard Clark,et al.  Compact pat trees , 1998 .

[9]  Kazuya Tsuruta,et al.  Shortest Unique Substrings Queries in Optimal Time , 2014, SOFSEM.

[10]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[11]  Friedrich Möller,et al.  Genome comparison without alignment using shortest unique substrings , 2005, BMC Bioinformatics.

[12]  Jian Pei,et al.  Shortest Unique Queries on Strings , 2014, SPIRE.

[13]  Wing-Kai Hon,et al.  Space-time trade-offs for finding shortest unique substrings and maximal unique matches , 2017, Theor. Comput. Sci..

[14]  Gonzalo Navarro,et al.  Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time , 2016, SODA.

[15]  Gonzalo Navarro,et al.  Succinct Suffix Arrays based on Run-Length Encoding , 2005, Nord. J. Comput..

[16]  Bojian Xu,et al.  A simple yet time-optimal and linear-space algorithm for shortest unique substring queries , 2015, Theor. Comput. Sci..

[17]  Hiroshi Sakamoto,et al.  A Faster Implementation of Online Run-Length Burrows-Wheeler Transform , 2017, IWOCA.

[18]  Rajeev Raman,et al.  Succinct representations of permutations and functions , 2011, Theor. Comput. Sci..

[19]  Volker Heun,et al.  Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays , 2011, SIAM J. Comput..

[20]  Rajeev Raman,et al.  Succinct Representations of Permutations , 2003, ICALP.

[21]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.