Parameterized complexity analysis for the Closest String with Wildcards problem

The Closest String problem asks to find a string s which is not too far from each string in a set of m input strings, where the distance is taken as the Hamming distance. This well-studied problem has various applications in computational biology and drug design. In this paper, we introduce a new variant of Closest String where the input strings can contain wildcards that can match any letter in the alphabet, and the goal is to find a solution string without wildcards. We call this problem the Closest String with Wildcards problem, and we analyze it in the framework of parameterized complexity. Our study determines for each natural parameterization whether this parameterization yields a fixed-parameter algorithm, or whether such an algorithm is highly unlikely to exist.

[1]  A. Halpern,et al.  An MCMC algorithm for haplotype assembly from whole-genome sequence data. , 2008, Genome research.

[2]  Piotr Berman,et al.  A Linear-Time Algorithm for the 1-Mismatch Problem , 1997, WADS.

[3]  Dániel Marx,et al.  Closest Substring Problems with Small Distances , 2008, SIAM J. Comput..

[4]  Rolf Niedermeier,et al.  On The Parameterized Intractability Of Motif Search Problems* , 2002, Comb..

[5]  Xiang-Sun Zhang,et al.  A Dynamic Programming Algorithm for the k-Haplotyping Problem , 2006 .

[6]  Giuseppe Lancia,et al.  Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem , 2002, WABI.

[7]  Rolf Niedermeier,et al.  Invitation to Fixed-Parameter Algorithms , 2006 .

[8]  Jörg Flum,et al.  Parameterized Complexity Theory (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .

[9]  Bin Ma,et al.  On the closest string and substring problems , 2002, JACM.

[10]  Alon Itai,et al.  On the Complexity of Timetable and Multicommodity Flow Problems , 1976, SIAM J. Comput..

[11]  Hendrik W. Lenstra,et al.  Integer Programming with a Fixed Number of Variables , 1983, Math. Oper. Res..

[12]  Ami Litman,et al.  On covering problems of codes , 1997, Theory of Computing Systems.

[13]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.

[14]  Giuseppe Lancia,et al.  Polynomial and APX-hard cases of the individual haplotyping problem , 2005, Theor. Comput. Sci..

[15]  Zhi-Zhong Chen,et al.  Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)-Motif Model , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[17]  Russell Schwartz,et al.  Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem , 2002, Briefings Bioinform..

[18]  Lusheng Wang,et al.  Efficient Algorithms for the Closest String and Distinguishing String Selection Problems , 2009, FAW.

[19]  Jianxin Wang,et al.  An Improved (and Practical) Parameterized Algorithm for the Individual Haplotyping Problem MFR with Mate-Pairs , 2007, Algorithmica.

[20]  Bin Ma,et al.  A three-string approach to the closest string problem , 2010, J. Comput. Syst. Sci..

[21]  Rolf Niedermeier,et al.  Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems , 2003, Algorithmica.

[22]  Bin Ma,et al.  Distinguishing string selection problems , 2003, SODA '99.

[23]  Rolf Niedermeier,et al.  Closest Strings, Primer Design, and Motif Search , 2010 .

[24]  Eleazar Eskin,et al.  Optimal algorithms for haplotype assembly from whole-genome sequence data , 2010, Bioinform..

[25]  Richard M. Karp,et al.  Reducibility among combinatorial problems" in complexity of computer computations , 1972 .

[26]  Joaquín Dopazo,et al.  Design of primers for PCR amplification of highly variable genomes , 1993, Comput. Appl. Biosci..

[27]  Robert E. Tarjan,et al.  A Linear-Time Algorithm for Testing the Truth of Certain Quantified Boolean Formulas , 1979, Inf. Process. Lett..

[28]  Bin Ma,et al.  More Efficient Algorithms for Closest String and Substring Problems , 2008, SIAM J. Comput..

[29]  Jörg Flum,et al.  Parameterized Complexity Theory , 2006, Texts in Theoretical Computer Science. An EATCS Series.

[30]  Vineet Bafna,et al.  HapCUT: an efficient and accurate algorithm for the haplotype assembly problem , 2008, ECCB.

[31]  Edward C. Holmes,et al.  Primer Master: a new program for the design and analysis of PCR primers , 1996, Comput. Appl. Biosci..

[32]  Markus Chimani,et al.  A Closer Look at the Closest String and Closest Substring Problem , 2011, ALENEX.

[33]  Ning Zhang,et al.  A More Efficient Closest String Problem , 2010, BICoB.

[34]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[35]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[36]  V. Bansal,et al.  The importance of phase information for human genomics , 2011, Nature Reviews Genetics.

[37]  Eleazar Eskin,et al.  Hap-seq: An Optimal Algorithm for Haplotype Phasing with Imputation Using Sequencing Data , 2012, RECOMB.

[38]  Zhi-Zhong Chen,et al.  Exact algorithms for haplotype assembly from whole-genome sequence data , 2013, Bioinform..

[39]  Amihood Amir,et al.  Approximations and Partial Solutions for the Consensus Sequence Problem , 2011, SPIRE.