Exemplar Longest Common Subsequence

In this paper, we investigate the computational and approximation complexity of the Exemplar Longest Common Subsequence (ELCS) of a set of sequences (ELCS problem), a generalization of the Longest Common Subsequence problem, where the input sequences are over the union of two disjoint sets of symbols, a set of mandatory symbols and a set of optional symbols. We show that different versions of the problem are APX-hard even for instances with two sequences. Moreover, we show that the related problem of determining the existence of a feasible solution of the ELCS of two sequences is NP-hard. On the positive side, we first present an efficient algorithm for the ELCS problem over instances of two sequences where each mandatory symbol can appear in total at most three times in the sequences. Furthermore, we present two fixed-parameter algorithms for the ELCS problem over instances of two sequences where the parameter is the number of mandatory symbols.

[1]  David Sankoff,et al.  Power Boosts for Cluster Tests , 2005, Comparative Genomics.

[2]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[3]  Romeo Rizzi,et al.  Conserved Interval Distance Computation Between Non-trivial Genomes , 2005, COCOON.

[4]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[5]  Hiroshi Imai,et al.  The Longest Common Subsequence Problem for Small Alphabet Size Between Many Strings , 1992, ISAAC.

[6]  Guillaume Fertin,et al.  Genomes Containing Duplicates Are Hard to Compare , 2006, International Conference on Computational Science.

[7]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[8]  M. W. Du,et al.  New Algorithms for the LCS Problem , 1984, J. Comput. Syst. Sci..

[9]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1994, SIAM J. Comput..

[10]  Jens Stoye,et al.  On the Similarity of Sets of Permutations and Its Applications to Genome Comparison , 2003, COCOON.

[11]  Viggo Kann,et al.  Some APX-completeness results for cubic graphs , 2000, Theor. Comput. Sci..

[12]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[13]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[14]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..

[15]  Andrea Maggiolo-Schettini,et al.  Computable Stack Functions for Semantics of Stack Programs , 1979, J. Comput. Syst. Sci..

[16]  David Sankoff,et al.  Genome rearrangement with gene families , 1999, Bioinform..

[17]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[18]  G. Blin,et al.  The breakpoint distance for signed sequences , 2005 .

[19]  D. Bryant The Complexity of Calculating Exemplar Distances , 2000 .

[20]  Robert E. Tarjan,et al.  A Linear-Time Algorithm for Testing the Truth of Certain Quantified Boolean Formulas , 1979, Inf. Process. Lett..