More Efficient Algorithms for Closest String and Substring Problems

The closest string and substring problems find applications in PCR primer design, genetic probe design, motif finding, and antisense drug design. For their importance, the two problems have been extensively studied recently in computational biology. Unfortunately both problems are NP-complete. Researchers have developed both fixed-parameter algorithms and approximation algorithms for the two problems. In terms of fixed-parameter, when the radius d is the parameter, the best-known fixed-parameter algorithm for closest string has time complexity O(ndd+1), which is still superpolynomial even if d = O(log n). In this paper we provide an O(n|Σ|O(d)) algorithm where Σ is the alphabet. This gives a polynomial time algorithm when d = O(log n) and Σ has constant size. Using the same technique, we additionally provide a more efficient subexponential time algorithm for the closest substring problem. In terms of approximation, both closest string and closest substring problems admit polynomial time approximation schemes (PTAS). The best known time complexity of the PTAS is O(nO(Ɛ-2 log 1/Ɛ)). In this paper we present a PTAS with time complexity O(nO(Ɛ-2)). At last, we prove that a restricted version of the closest substring has the same parameterized complexity as closest substring, answering an open question in the literature.

[1]  Dániel Marx,et al.  The closest substring problem with small distances , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[2]  Ying Wang,et al.  Degenerated primer design to amplify the heavy chain variable region from immunoglobulin cDNA , 2006, BMC Bioinformatics.

[3]  Piotr Berman,et al.  A Linear-Time Algorithm for the 1-Mismatch Problem , 1997, WADS.

[4]  Sanguthevar Rajasekaran,et al.  Space and Time Efficient Algorithms for Planted Motif Search , 2006, International Conference on Computational Science.

[5]  K. Lucas,et al.  An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes , 1991, Comput. Appl. Biosci..

[6]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[7]  Xuan Liu,et al.  Parallel Genetic Algorithm and Parallel Simulated Annealing Algorithm for the Closest String Problem , 2005, ADMA.

[8]  Todd Wareham,et al.  On the complexity of finding common approximate substrings , 2003, Theor. Comput. Sci..

[9]  Rolf Niedermeier,et al.  On The Parameterized Intractability Of Motif Search Problems* , 2002, Comb..

[10]  Irena Rusu,et al.  Hard problems in similarity searching , 2004, Discret. Appl. Math..

[11]  Panos M. Pardalos,et al.  Optimal Solutions for the Closest-String Problem via Integer Programming , 2004, INFORMS J. Comput..

[12]  Bin Ma,et al.  Distinguishing string selection problems , 2003, SODA '99.

[13]  Alexandr Andoni,et al.  On the Optimality of the Dimensionality Reduction Method , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[14]  Rolf Niedermeier,et al.  Closest Strings, Primer Design, and Motif Search , 2010 .

[15]  Bin Ma,et al.  Finding similar regions in many strings , 1999, STOC '99.

[16]  Ming Li,et al.  On the k-Closest Substring and k-Consensus Pattern Problems , 2004, CPM.

[17]  Bin Ma,et al.  A Polynominal Time Approximation Scheme for the Closest Substring Problem , 2000, CPM.

[18]  Joaquín Dopazo,et al.  Design of primers for PCR amplification of highly variable genomes , 1993, Comput. Appl. Biosci..

[19]  Rolf Niedermeier,et al.  Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems , 2003, Algorithmica.

[20]  François Nicolas,et al.  Complexities of the Centre and Median String Problems , 2003, CPM.

[21]  Bin Ma,et al.  Genetic Design of Drugs Without Side-Effects , 2003, SIAM J. Comput..

[22]  Bin Ma,et al.  On the closest string and substring problems , 2002, JACM.

[23]  Rolf Niedermeier,et al.  On Exact and Approximation Algorithms for Distinguishing Substring Selection , 2003, FCT.

[24]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[25]  Andrew D. Smith,et al.  Complexity of Approximating Closest Substring Problems , 2003, FCT.

[26]  Holger Mauch,et al.  Genetic algorithm approach for the closest string problem , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[27]  Edward C. Holmes,et al.  Primer Master: a new program for the design and analysis of PCR primers , 1996, Comput. Appl. Biosci..

[28]  Lusheng Wang,et al.  Randomized Algorithms for Motif Detection , 2004, ISAAC.

[29]  A. Litman,et al.  On covering problems of codes , 1997, Theory of Computing Systems.

[30]  Giuseppe Lancia,et al.  Banishing Bias from Consensus Sequences , 1997, CPM.