Minimum Common String Partition Problem: Hardness and Approximations

String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing or compression In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement. A partition of a string A is a sequence ${\mathcal P}=(P_{1},P_{2},...P_{m})$ of strings, called the blocks, whose concatenation is equal to A Given a partition ${\mathcal P}$ of a string A and a partition ${\mathcal Q}$ of a string B, we say that the pair $\langle\mathcal{P,Q}\rangle$ is a common partition of A and B if ${\mathcal Q}$ is a permutation of ${\mathcal P}$ The minimum common string partition problem (MCSP) is to find a common partition of two strings A and B with the minimum number of blocks The restricted version of MCSP where each letter occurs at most k times in each input string, is denoted by k-MCSP. In this paper, we show that 2-MCSP (and therefore MCSP) is NP-hard and, moreover, even APX-hard We describe a 1.1037-approximation for 2-MCSP and a linear time 4-approximation algorithm for 3-MCSP We are not aware of any better approximations.

[1]  P. Berman,et al.  On Some Tighter Inapproximability Results , 1998, Electron. Colloquium Comput. Complex..

[2]  Tao Jiang,et al.  Computing the Assignment of Orthologous Genes via Genome Rearrangement , 2005, APBC.

[3]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[4]  Marek Karpinski,et al.  On Some Tighter Inapproximability Results (Extended Abstract) , 1999, ICALP.

[5]  Avraham Goldstein,et al.  Minimum Common String Partition Problem: Hardness and Approximations , 2005 .

[6]  Xin Chen,et al.  Assignment of orthologous genes via genome rearrangement , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Robert W. Irving,et al.  Sorting Strings by Reversals and by Transpositions , 2001, SIAM J. Discret. Math..

[8]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2002, SODA '02.

[9]  Dana Shapira,et al.  Edit distance with move operations , 2002, J. Discrete Algorithms.

[10]  Petr Kolman,et al.  Approximating reversal distance for strings with bounded number of duplicates , 2005, Discret. Appl. Math..

[11]  Marek Chrobak,et al.  The Greedy Algorithm for the Minimum Common String Partition Problem , 2004, APPROX-RANDOM.

[12]  W. Ewens,et al.  The chromosome inversion problem , 1982 .

[13]  Rick Durrett,et al.  Genome rearrangement , 2022 .

[14]  Marek Chrobak,et al.  The greedy algorithm for the minimum common string partition problem , 2005, TALG.

[15]  Tao Jiang,et al.  Assignment of orthologous genes via genome rearrangement - eScholarship , 2005 .

[16]  Uri Zwick,et al.  Approximating MIN k-SAT , 2002, ISAAC.

[17]  Alberto Caprara,et al.  Sorting by reversals is difficult , 1997, RECOMB '97.

[18]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .