Separating sets of strings by finding matching patterns is almost always hard

We study the complexity of the problem of searching for a set of patterns that separate two given sets of strings. This problem has applications in a wide variety of areas, most notably in data mining, computational biology, and in understanding the complexity of genetic algorithms. We show that the basic problem of finding a small set of patterns that match one set of strings but do not match any string in a second set is difficult (NP-complete, W[2]-hard when parameterized by the size of the pattern set, and APX-hard). We then perform a detailed parameterized analysis of the problem, separating tractable and intractable variants. In particular we show that parameterizing by the size of pattern set and the number of strings, and the size of the alphabet and the number of strings give FPT results, amongst others.

[1]  C. Cotta,et al.  The Parameterized Complexity of Multiparent Recombination , 2005 .

[2]  Pablo Moscato,et al.  The k-FEATURE SET problem is W[2]-complete , 2003, J. Comput. Syst. Sci..

[3]  Ran Raz,et al.  A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP , 1997, STOC '97.

[4]  Christian Komusiewicz,et al.  Multivariate Algorithmics for NP-Hard String Problems , 2014, Bull. EATCS.

[5]  Rolf Niedermeier,et al.  Parameterized Intractability of Distinguishing Substring Selection , 2006, Theory of Computing Systems.

[6]  William H. Offenhauser,et al.  Wild Boars as Hosts of Human-Pathogenic Anaplasma phagocytophilum Variants , 2012, Emerging infectious diseases.

[7]  Michael R. Fellows,et al.  Fundamentals of Parameterized Complexity , 2013 .

[8]  Bin Ma,et al.  A PTAS for Distinguishing (Sub)string Selection , 2002, ICALP.

[9]  Rolf Niedermeier,et al.  Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems , 2003, Algorithmica.

[10]  Jörg Flum,et al.  Parameterized Complexity Theory (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .

[11]  Bin Ma,et al.  On the closest string and substring problems , 2002, JACM.

[12]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[13]  Rolf Niedermeier,et al.  Using Patterns to Form Homogeneous Teams , 2013, Algorithmica.

[14]  Danny Hermelin,et al.  Parameterized complexity analysis for the Closest String with Wildcards problem , 2014, Theor. Comput. Sci..

[15]  Lenore Cowen,et al.  Approximation Algorithms for the Class Cover Problem , 2004, Annals of Mathematics and Artificial Intelligence.

[16]  Jörg Flum,et al.  Parameterized Complexity Theory , 2006, Texts in Theoretical Computer Science. An EATCS Series.

[17]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[18]  Dániel Marx,et al.  Parameterized Complexity and Approximation Algorithms , 2008, Comput. J..

[19]  Jörg Flum,et al.  Parametrized Complexity and Subexponential Time (Column: Computational Complexity) , 2004, Bull. EATCS.

[20]  Pierluigi Crescenzi,et al.  A short guide to approximation preserving reductions , 1997, Proceedings of Computational Complexity. Twelfth Annual IEEE Conference.

[21]  Michael R. Fellows,et al.  The Parameterized Complexity of Relational Database Queries and an Improved Characterization of W[1] , 1996, DMTCS.

[22]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[23]  Robin J. Wilson EVERY PLANAR MAP IS FOUR COLORABLE , 1991 .

[24]  Rolf Niedermeier,et al.  Pattern-Guided k-Anonymity , 2013, Algorithms.

[25]  Dana Angluin,et al.  Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[26]  Rolf Niedermeier,et al.  On the Parameterized Intractability of CLOSEST SUBSTRINGsize and Related Problems , 2002, STACS.

[27]  Yijia Chen,et al.  The Constant Inapproximability of the Parameterized Dominating Set Problem , 2015, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[28]  Yijia Chen,et al.  An analysis of the W*-hierarchy , 2007, J. Symb. Log..

[29]  Leonard Pitt,et al.  A polynomial-time algorithm for learning k-variable pattern languages from examples , 1989, COLT '89.