On the string consensus problem and the Manhattan sequence consensus problem

Abstract We study the Manhattan Sequence Consensus problem (MSC problem) in which we are given k integer sequences, each of length l , and we are to find an integer sequence x of length l (called a consensus sequence ) such that the maximum Manhattan distance of x from each of the input sequences is minimized. A related problem, with Hamming distance instead of Manhattan distance, is called Hamming String Consensus (HSC), also known under the names of string center problem or closest string problem. For binary sequences Manhattan distance coincides with Hamming distance, hence in this case HSC is a special case of MSC. We design a practically efficient O ( l ) -time algorithm solving MSC for k ≤ 5 sequences. It improves upon the quadratic algorithm by Amir et al. (2012) [1] for HSC for k = 5 binary strings. Similarly as in the algorithm of Amir et al., we use a column-based framework. We replace the implied general integer linear programming by its easy special cases due to combinatorial properties of MSC for k ≤ 5 . Practicality of our algorithms has been verified experimentally. We also show that for a general parameter k any instance can be reduced in linear time to a kernel of size k !, so the problem is fixed-parameter tractable. Nevertheless, for k ≥ 4 this is still too large for any naive solution to be feasible in practice. This is a full version of an article published at SPIRE 2014 [15] .

[1]  Bin Ma,et al.  More Efficient Algorithms for Closest String and Substring Problems , 2009, SIAM J. Comput..

[2]  Daniel Lokshtanov,et al.  New Methods in Parameterized Algorithms and Complexity , 2009 .

[3]  Gérard D. Cohen,et al.  Long packing and covering codes , 1997, IEEE Trans. Inf. Theory.

[4]  Christina Boucher,et al.  On the Structure of Small Motif Recognition Instances , 2008, SPIRE.

[5]  Hendrik W. Lenstra,et al.  Integer Programming with a Fixed Number of Variables , 1983, Math. Oper. Res..

[6]  Jack Ritter,et al.  An efficient bounding sphere , 1990 .

[7]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[8]  Ravi Kannan,et al.  Minkowski's Convex Body Theorem and Integer Programming , 1987, Math. Oper. Res..

[9]  Amihood Amir,et al.  Configurations and Minority in the String Consensus Problem , 2012, SPIRE.

[10]  N. J. A. Sloane,et al.  On the covering radius of codes , 1985, IEEE Trans. Inf. Theory.

[11]  Amihood Amir,et al.  On the hardness of the Consensus String problem , 2013, Inf. Process. Lett..

[12]  Rolf Niedermeier,et al.  Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems , 2003, Algorithmica.

[13]  Bin Ma,et al.  On the closest string and substring problems , 2002, JACM.

[14]  Bernd Gärtner,et al.  Fast Smallest-Enclosing-Ball Computation in High Dimensions , 2003, ESA.

[15]  Joseph S. B. Mitchell,et al.  Comuting Core-Sets and Approximate Smallest Enclosing HyperSpheres in High Dimensions , 2003, ALENEX.

[16]  Bernd Gärtner,et al.  An efficient, exact, and generic quadratic programming solver for geometric optimization , 2000, SCG '00.

[17]  András Frank,et al.  An application of simultaneous diophantine approximation in combinatorial optimization , 1987, Comb..

[18]  A. Litman,et al.  On covering problems of codes , 1997, Theory of Computing Systems.

[19]  Arya Mazumdar,et al.  On Chebyshev radius of a set in Hamming space and the closest string problem , 2013, 2013 IEEE International Symposium on Information Theory.

[20]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[21]  Alexandr Andoni,et al.  On the Optimality of the Dimensionality Reduction Method , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[22]  Wojciech Rytter,et al.  On the String Consensus Problem and the Manhattan Sequence Consensus Problem , 2014, SPIRE.

[23]  Bin Ma,et al.  Distinguishing string selection problems , 2003, SODA '99.