On the Hardness of Counting and Sampling Center Strings

Given a set S of n strings, each of length \ell , and a nonnegative value d , we define a center string as a string of length \ell that has Hamming distance at most d from each string in S . The \#{\rm CLOSEST STRING} problem aims to determine the number of center strings for a given set of strings S and input parameters n , \ell , and d . We show \#{\rm CLOSEST STRING} is impossible to solve exactly or even approximately in polynomial time, and that restricting \#{\rm CLOSEST STRING} so that any one of the parameters n , \ell , or d is fixed leads to a fully polynomial-time randomized approximation scheme (FPRAS). We show equivalent results for the problem of efficiently sampling center strings uniformly at random (u.a.r.).

[1]  Rolf Niedermeier,et al.  On the Parameterized Intractability of CLOSEST SUBSTRINGsize and Related Problems , 2002, STACS.

[2]  Bin Ma,et al.  More Efficient Algorithms for Closest String and Substring Problems , 2009, SIAM J. Comput..

[3]  Bin Ma,et al.  Finding similar regions in many strings , 1999, STOC '99.

[4]  Bin Ma,et al.  Genetic Design of Drugs Without Side-Effects , 2003, SIAM J. Comput..

[5]  Rolf Niedermeier,et al.  Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems , 2003, Algorithmica.

[6]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[7]  Bin Ma,et al.  A Polynominal Time Approximation Scheme for the Closest Substring Problem , 2000, CPM.

[8]  Michael Molloy The Glauber dynamics on colourings of a graph with high girth and maximum degree , 2002, STOC '02.

[9]  Alistair Sinclair,et al.  Random walks on truncated cubes and sampling 0-1 knapsack solutions , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[10]  K. Lucas,et al.  An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes , 1991, Comput. Appl. Biosci..

[11]  Thomas P. Hayes,et al.  A non-Markovian coupling for randomly sampling colorings , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[12]  A. Litman,et al.  On covering problems of codes , 1997, Theory of Computing Systems.

[13]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[14]  Martin E. Dyer,et al.  Approximately Counting Hamilton Paths and Cycles in Dense Graphs , 1998, SIAM J. Comput..

[15]  Martin E. Dyer,et al.  Approximate counting by dynamic programming , 2003, STOC '03.

[16]  Mark Jerrum,et al.  Approximating the Permanent , 1989, SIAM J. Comput..

[17]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[18]  Christina Boucher,et al.  Detecting Motifs in a Large Data Set: Applying Probabilistic Insights to Motif Finding , 2009, BICoB.

[19]  Martin E. Dyer,et al.  Randomly colouring graphs with lower bounds on girth and maximum degree , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[20]  Edward C. Holmes,et al.  Primer Master: a new program for the design and analysis of PCR primers , 1996, Comput. Appl. Biosci..

[21]  Bin Ma,et al.  Distinguishing string selection problems , 2003, SODA '99.

[22]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[23]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[24]  Bin Ma,et al.  Finding Similar Regions in Many Sequences , 2002, J. Comput. Syst. Sci..

[25]  Giuseppe Lancia,et al.  Banishing Bias from Consensus Sequences , 1997, CPM.

[26]  Rolf Niedermeier,et al.  On The Parameterized Intractability Of Motif Search Problems* , 2002, Comb..

[27]  Martin E. Dyer,et al.  On Counting Independent Sets in Sparse Graphs , 2002, SIAM J. Comput..

[28]  Joaquín Dopazo,et al.  Design of primers for PCR amplification of highly variable genomes , 1993, Comput. Appl. Biosci..