The Parameterized Complexity of p-Center Approximate Substring Problems

Problems associated with nding strings that are within a speci ed Hamming distance of a given set of strings occur in several disciplines. All of the problems investigated are NP -hard and have varying levels of approximability. In this paper, we use techniques from parameterized computational complexity to assess non-polynomial time algorithmic options for three of these problems, namely p-exact substring (pes), approximate substring (1as), and p-approximate substring (pas). These problems vary whether the substring must be an exact match, and also whether a single substring or a set of substrings (of cardinality p) is required. Our analyses indicate under which parameter restrictions useful algorithms are possible, and include both class membership and parameterized reductions to prove class hardness. Since variation in parameter restrictions will lead to di erent algorithms being preferable, we give a variety of algorithms for the xed parameter tractable problem variations. One of these, for 1as with alphabet, substring length, and distance all xed, is an improvement of one of the best previously known exact algorithms (under these restrictions). Other algorithms solve parameterized variants previously unexplored. We also prove that pes is NP-hard, and show inapproximability for pes and pas. Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada. E-mail: {pevans,p7ka}@unb.ca Department of Computer Science, Memorial University of Newfoundland, St. John's, NF, Canada. E-mail: harold@cs.mun.ca

[1]  Bin Ma,et al.  Distinguishing string selection problems , 2003, SODA '99.

[2]  M. Waterman,et al.  Pattern recognition in several sequences: consensus and alignment. , 1984, Bulletin of mathematical biology.

[3]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[4]  Giorgio Ausiello,et al.  Structure Preserving Reductions among Convex Optimization Problems , 1980, J. Comput. Syst. Sci..

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  Luca Trevisan,et al.  On the Efficiency of Polynomial Time Approximation Schemes , 1997, Inf. Process. Lett..

[7]  Andrzej Lingas,et al.  Approximation Algorithms for Hamming Clustering Problems , 2000, CPM.

[8]  Esko Ukkonen,et al.  On{line Construction of Suux Trees 1 , 1995 .

[9]  Bin Ma,et al.  Finding Similar Regions in Many Sequences , 2002, J. Comput. Syst. Sci..

[10]  Marie-France Sagot,et al.  Spelling Approximate Repeated or Common Motifs Using a Suffix Tree , 1998, LATIN.

[11]  Alain Viari,et al.  Searching for Repeated Words in a Text Allowing for Mismatches and Gaps , 1995 .

[12]  Esko Ukkonen,et al.  Approximate String-Matching over Suffix Trees , 1993, CPM.

[13]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[14]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[15]  Bin Ma,et al.  A Polynominal Time Approximation Scheme for the Closest Substring Problem , 2000, CPM.

[16]  Francisco Casacuberta,et al.  Topology of Strings: Median String is NP-Complete , 1999, Theor. Comput. Sci..

[17]  Benno Schwikowski,et al.  An Exact Algorithm to Identify Motifs in Orthologous Sequences from Multiple Species , 2000, ISMB.

[18]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[19]  Michael R. Fellows,et al.  Systematic parameterized complexity analysis in computational phonology , 1999 .

[20]  Ran Raz,et al.  A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP , 1997, STOC '97.