Tight Bounds on the Maximum Number of Shortest Unique Substrings

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

[1]  Bojian Xu,et al.  On stabbing queries for generalized longest repeat , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Wing-Kai Hon,et al.  An In-place Framework for Exact and Approximate Shortest Unique Substring Queries , 2015, ISAAC.

[3]  Bojian Xu,et al.  Shortest Unique Substring Query Revisited , 2014, CPM.

[4]  Kazuya Tsuruta,et al.  Shortest Unique Substrings Queries in Optimal Time , 2014, SOFSEM.

[5]  Hideo Bannai,et al.  Shortest Unique Substring Queries on Run-Length Encoded Strings , 2016, MFCS.

[6]  Jian Pei,et al.  On shortest unique substring queries , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[7]  Jian Pei,et al.  Shortest Unique Queries on Strings , 2014, SPIRE.