Weighted Shortest Common Supersequence Problem Revisited

A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings $W_1$ and $W_2$ and a threshold $\mathit{Freq}$ on probability, and we are asked to compute the shortest (standard) string $S$ such that both $W_1$ and $W_2$ match subsequences of $S$ (not necessarily the same) with probability at least $\mathit{Freq}$. Amir et al. showed that this problem is NP-complete if the probabilities, including the threshold $\mathit{Freq}$, are represented by their logarithms (encoded in binary). We present an algorithm that solves the WSCS problem for two weighted strings of length $n$ over a constant-sized alphabet in $\mathcal{O}(n^2\sqrt{z} \log{z})$ time. Notably, our upper bound matches known conditional lower bounds stating that the WSCS problem cannot be solved in $\mathcal{O}(n^{2-\varepsilon})$ time or in $\mathcal{O}^*(z^{0.5-\varepsilon})$ time unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively). We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem, introduced by Amir et al. [JDA 2010]. We show that the WLCS problem cannot be solved in $\mathcal{O}(n^{f(z)})$ time, for any function $f(z)$, unless $\mathrm{P}=\mathrm{NP}$.

[1]  Wojciech Rytter,et al.  Polynomial-time approximation algorithms for weighted LCS problem , 2016, Discret. Appl. Math..

[2]  Russell Impagliazzo,et al.  Which Problems Have Strongly Exponential Complexity? , 2001, J. Comput. Syst. Sci..

[3]  Costas S. Iliopoulos,et al.  Property Suffix Array with Applications , 2018, LATIN.

[4]  Amihood Amir,et al.  Weighted LCS , 2009, J. Discrete Algorithms.

[5]  Russell Impagliazzo,et al.  On the Complexity of k-SAT , 2001, J. Comput. Syst. Sci..

[6]  Costas S. Iliopoulos,et al.  On-line weighted pattern matching , 2019, Inf. Comput..

[7]  Solon P. Pissis,et al.  Efficient Index for Weighted Sequences , 2016, CPM.

[8]  Amihood Amir,et al.  Weighted Shortest Common Supersequence , 2011, SPIRE.

[9]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[10]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..

[11]  Esko Ukkonen,et al.  The Shortest Common Supersequence Problem over Binary Alphabet is NP-Complete , 1981, Theor. Comput. Sci..

[12]  Kostas Tsichlas,et al.  Longest Common Subsequence on Weighted Sequences , 2020, CPM.

[13]  Solon P. Pissis,et al.  Indexing Weighted Sequences: Neat and Efficient , 2020, Inf. Comput..

[14]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[15]  Amir Abboud,et al.  Tight Hardness Results for LCS and Other Sequence Similarity Measures , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[16]  Solon P. Pissis,et al.  Linear-time computation of prefix table for weighted strings & applications , 2016, Theor. Comput. Sci..

[17]  Ellis Horowitz,et al.  Computing Partitions with Applications to the Knapsack Problem , 1974, JACM.

[18]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[19]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[20]  Solon P. Pissis,et al.  Pattern Matching and Consensus Problems on Weighted Sequences and Profiles , 2016, Theory of Computing Systems.

[21]  Jakub Radoszewski,et al.  Streaming K-Mismatch with Error Correcting and Applications , 2017, 2017 Data Compression Conference (DCC).

[22]  Dániel Marx,et al.  Lower bounds based on the Exponential Time Hypothesis , 2011, Bull. EATCS.

[23]  Nikhil Bansal,et al.  Faster Space-Efficient Algorithms for Subset Sum, k-Sum and Related Problems , 2016, SIAM J. Comput..

[24]  Tsvi Kopelowitz,et al.  Property matching and weighted matching , 2006, Theor. Comput. Sci..

[25]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[26]  Solon P. Pissis,et al.  Crochemore’s Partitioning on Weighted Strings and Applications , 2017, Algorithmica.