On Computing Centroids According to the p-Norms of Hamming Distance Vectors

In this paper we consider the $p$-Norm Hamming Centroid problem which asks to determine whether some given binary strings have a centroid with a bound on the $p$-norm of its Hamming distances to the strings. Specifically, given a set of strings $S$ and a real $k$, we consider the problem of determining whether there exists a string $s^*$ with $\big(\sum_{s \in S}d^p(s^*,s)\big)^{1/p} \leq k$, where $d(,)$ denotes the Hamming distance metric. This problem has important applications in data clustering, and is a generalization of the well-known polynomial-time solvable \textsc{Consensus String} $(p=1)$ problem, as well as the NP-hard \textsc{Closest String} $(p=\infty)$ problem. Our main result shows that the problem is NP-hard for all fixed rational $p > 1$, closing the gap for all rational values of $p$ between $1$ and $\infty$. Under standard complexity assumptions the reduction also implies that the problem has no $2^{o(n+m)}$-time or $2^{o(k^{\frac{p}{(p+1)}})}$-time algorithm, where $m$ denotes the number of input strings and $n$ denotes the length of each string, for any fixed $p > 1$. Both running time lower bounds are tight. In particular, we provide a $2^{k^{\frac{p}{(p+1)}+\varepsilon}}$-time algorithm for each fixed $\varepsilon > 0$. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-$2$ approximation algorithm for the problem.

[1]  Yuan Zhou Introduction to Coding Theory , 2010 .

[2]  Jean-Pierre Seifert,et al.  Approximating Shortest Lattice Vectors is Not Harder Than Approximating Closest Lattice Vectors , 1999, Electron. Colloquium Comput. Complex..

[3]  Margaret L. Brandeau,et al.  Parametric Facility Location on a Tree Network with an Lp-Norm Cost Function , 1988, Transp. Sci..

[4]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[5]  Danny Hermelin,et al.  A Note on Clustering Aggregation , 2018, ArXiv.

[6]  A. Money,et al.  The linear regression model: Lp norm estimation and the choice of p , 1982 .

[7]  Douglas R. Shier,et al.  Optimal Locations for a Class of Nonlinear, Single-Facility Location Problems on a Network , 1983, Oper. Res..

[8]  Evangelos Markakis,et al.  Multiple Referenda and Multiwinner Elections Using Hamming Distances: Complexity and Manipulability , 2015, AAMAS.

[9]  Abdelhak M. Zoubir,et al.  An ℓpℓp-norm minimization approach to time delay estimation in impulsive noise , 2013, Digit. Signal Process..

[10]  D. Marc Kilgour,et al.  Approval Balloting for Multi-winner Elections , 2010 .

[11]  Gerhard J. Woeginger,et al.  All-norm approximation algorithms , 2002, J. Algorithms.

[12]  Bin Ma,et al.  More Efficient Algorithms for Closest String and Substring Problems , 2008, SIAM J. Comput..

[13]  Gérard D. Cohen,et al.  Covering Codes , 2005, North-Holland mathematical library.

[14]  Christian Komusiewicz,et al.  On the parameterized complexity of consensus clustering , 2011, Theor. Comput. Sci..

[15]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[16]  Bin Ma,et al.  Distinguishing string selection problems , 2003, SODA '99.

[17]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[18]  Daniel Dadush,et al.  Solving the Closest Vector Problem in 2^n Time -- The Discrete Gaussian Strikes Again! , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[19]  Michal Pilipczuk,et al.  Parameterized Algorithms , 2015, Springer International Publishing.

[20]  Bin Ma,et al.  A three-string approach to the closest string problem , 2010, J. Comput. Syst. Sci..

[21]  Krzysztof Rzadca,et al.  Collective Schedules: Scheduling Meets Computational Social Choice , 2018, AAMAS.

[22]  Piotr Faliszewski,et al.  Multiwinner Rules on Paths From k-Borda to Chamberlin-Courant , 2017, IJCAI.

[23]  Rolf Niedermeier,et al.  Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems , 2003, Algorithmica.

[24]  Martin Koutecký,et al.  Combinatorial n-fold integer programming and applications , 2017, Mathematical Programming.

[25]  Nisheeth K. Vishnoi,et al.  Algorithms and hardness for subspace approximation , 2009, SODA '11.

[26]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[27]  Steven J. Brams,et al.  A Minimax Procedure for Negotiating Multilateral Treaties , 2007 .

[28]  László Lovász,et al.  Factoring polynomials with rational coefficients , 1982 .

[29]  Shankar N. Sivarajan,et al.  A Generalization of the Minisum and Minimax Voting Methods , 2016, ArXiv.

[30]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[31]  J. Jensen Sur les fonctions convexes et les inégalités entre les valeurs moyennes , 1906 .

[32]  Prasad Raghavendra,et al.  Bypassing UGC from Some Optimal Geometric Inapproximability Results , 2016, TALG.

[33]  G. Nemhauser,et al.  Integer Programming , 2020 .

[34]  Michal Pilipczuk,et al.  Lower bounds for approximation schemes for Closest String , 2015, SWAT.

[35]  Joseph Lipka,et al.  A Table of Integrals , 2010 .

[36]  A. Money,et al.  Nonlinear Lp-Norm Estimation , 2020 .

[37]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[38]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[39]  Santosh S. Vempala,et al.  Enumerative Lattice Algorithms in any Norm Via M-ellipsoid Coverings , 2010, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[40]  Humberto Bustince,et al.  A Practical Guide to Averaging Functions , 2015, Studies in Fuzziness and Soft Computing.

[41]  Piotr Faliszewski,et al.  Committee Scoring Rules , 2018, ACM Trans. Economics and Comput..

[42]  Louis A. Romero,et al.  Minimum Lp-norm two-dimensional phase unwrapping , 1996 .

[43]  Paul S. Bradley,et al.  Clustering via Concave Minimization , 1996, NIPS.

[44]  S. Vempala,et al.  Integer programming, lattice algorithms, and deterministic volume estimation , 2012 .

[45]  Sergei Chubanov A Polynomial-Time Descent Method for Separable Convex Optimization Problems with Linear Constraints , 2016, SIAM J. Optim..

[46]  A. Litman,et al.  On covering problems of codes , 1997, Theory of Computing Systems.