Swiftly Computing Center Strings

BackgroundThe center string (or closest string) problem is a classic computer science problem with important applications in computational biology. Given k input strings and a distance threshold d, we search for a string within Hamming distance at most d to each input string. This problem is NP complete.ResultsIn this paper, we focus on exact methods for the problem that are also swift in application. We first introduce data reduction techniques that allow us to infer that certain instances have no solution, or that a center string must satisfy certain conditions. We describe how to use this information to speed up two previously published search tree algorithms. Then, we describe a novel iterative search strategy that is effecient in practice, where some of our reduction techniques can also be applied. Finally, we present results of an evaluation study for two different data sets from a biological application.ConclusionsWe find that the running time for computing the optimal center string is dominated by the subroutine calls for d = dopt -1 and d = dopt. Our data reduction is very effective for both, either rejecting unsolvable instances or solving trivial positions. We find that this speeds up computations considerably.

[1]  Jaime I. Dávila,et al.  Fast and Practical Algorithms for Planted (l, d) Motif Search , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Sven Rahmann,et al.  Integer Linear Programming Techniques for Discovering Approximate Gene Clusters , 2007 .

[3]  Jens Stoye,et al.  Computation of Median Gene Clusters , 2008, RECOMB.

[4]  Ying Wang,et al.  Degenerated primer design to amplify the heavy chain variable region from immunoglobulin cDNA , 2006, BMC Bioinformatics.

[5]  Bin Ma,et al.  A three-string approach to the closest string problem , 2010, J. Comput. Syst. Sci..

[6]  Rolf Niedermeier,et al.  Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems , 2003, Algorithmica.

[7]  Bin Ma,et al.  Distinguishing string selection problems , 2003, SODA '99.

[8]  Panos M. Pardalos,et al.  Optimal Solutions for the Closest-String Problem via Integer Programming , 2004, INFORMS J. Comput..

[9]  Falk Schreiber,et al.  Wiley Series on Bioinformatics: Computational Techniques and Engineering , 2008 .

[10]  A. Litman,et al.  On covering problems of codes , 1997, Theory of Computing Systems.

[11]  Bin Ma,et al.  More Efficient Algorithms for Closest String and Substring Problems , 2008, SIAM J. Comput..

[12]  Lars Kotthoff,et al.  The Exact Closest String Problem as a Constraint Satisfaction Problem , 2010, ArXiv.

[13]  BMC Bioinformatics , 2005 .

[14]  C. DeLisi,et al.  The society of genes: networks of functional links between genes from comparative genomics , 2002, Genome Biology.

[15]  François Nicolas,et al.  Hardness results for the center and median string problems under the weighted and unweighted edit distances , 2005, J. Discrete Algorithms.

[16]  Xuan Liu,et al.  Parallel Genetic Algorithm and Parallel Simulated Annealing Algorithm for the Closest String Problem , 2005, ADMA.

[17]  Bin Ma,et al.  More Efficient Algorithms for Closest String and Substring Problems , 2009, SIAM J. Comput..

[18]  Lusheng Wang,et al.  Efficient Algorithms for the Closest String and Distinguishing String Selection Problems , 2009, FAW.

[19]  Simone Faro,et al.  Ant-CSP: An Ant Colony Optimization Algorithm for the Closest String Problem , 2009, SOFSEM.

[20]  A. Svatoš Max Planck Institute for Chemical Ecology in Jena , 1998 .